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TITLE 

A METHOD FOR HIGH-DENSITY MICROARRAY MEDIATED 

GENE EXPRESSION PROFILING 
This application claims the benefit of U.S. Provisional Application 
5 No. 60/1 59,898, filed October 1 5, 1999. 

FIELD OF THE INVENTION 
This invention is in the field of bacterial gene expression. More 
specifically, this invention is a method for the high density, microarray-mediated 
gene expression profiling of Escherichia coli for comprehensive gene expression 
10 analysis. 

BACKGROUND OF THE INVENTION 
Escherichia coli has been exhaustively studied for over 50 years. Early 
experiments measured the molecular fluxes from small compounds into 
macromolecular constituents. These studies were followed by others in which 

15 small molecule pools of central metabolic building blocks, nucleotides and amino 
acids were determined. The levels of several macromolecular components, 
including individual species of proteins, have been measured. Such measurements 
of the steady state provide a census of the cellular content while changes upon 
imposition of a stress catalogue the cell's fight for survival. This response to an 

20 insulting or adverse condition can take many forms from relieving end product 
inhibition to derepressing transcription. 

In E. coli, experiments to define stress-related, global regulatory responses 
have often relied upon one of two approaches. In the first, operon fusions induced 
by a particular stress are isolated. In the second, proteomic measures in which the 

25 protein fractions from stressed and un-stressed cultures are separated by a 
two-dimensional method and then compared. Each method has an inherent 
technological hurdle; for the former, the map location of responsive gene fusions 
must be known precisely, and for the latter, induced or repressed proteins excised 
from the two-dimensional gels must be identified. 

30 Another method uses a transposon-mediated mutagenesis (Spector et al. J. 

Bacteriol 170:345-351 (1988)). A reporter gene is inserted at a random location 
in the genome using a transposon. By assaying for the reporter gene before and 
after the treatment, genes affected by the treatment can be mapped and cloned by 
using the linked transposon as a marker. However, this method is limited to 

35 non-essential genes. 

Alternatively, mRNA measurements utilizing techniques (such as 
hybridization to DNA and primer extension) have allowed the monitoring of 
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individual gene's expression profiles. DeRisi et al. (Science 278:680-686 (1997)) 
reported the expression profiling of most yeast genes. The measurements were 
facilitated by high-density arrays of individual genes and specific labeling of 
cDNA copies of eukaryotic mRNA using polyA tail-specific primers. The lack of 
5 a polyA tail and the extremely short bacterial mRNA half life represent hurdles for 
the application of DNA micro-array technology to prokaryotic research. 

A comprehensive expression profiling has been performed previously with 
the yeast Saccharomyces cerevisiae. Adaptation of RNA isolation and labeling 
protocols from eukaryotes to prokaryotes is not straightforward since eukaryotic 
10 mRNA manipulations often exploit 3 '-polyadenylation of this molecular species. 

Chuang et al. (7. Bacterol. 175:2026-2036 (1993)) reported an expression 
profiling using large DNA fragments from an ordered X library of E. coli genomic 
fragments as a capture reagent. It allowed the comparison of the expression 
patterns from large portions of DNA fragments by comparing mRNA levels from 
15 stressed and unstressed £ coli cultures. The resolution of this method, however, 
was unsatisfactory. Expression of groups of genes, as opposed to the expression 
of each individual gene was measured. Moreover, the method used radio-labeled 
DNA as a probe with the incumbent need for safety precautions. Furthermore, the 
use of radio-labeled probe prevents the simultaneous measurement of the 
20 expression level in a test sample and a control sample. 

Richmond et al. (Nucleic Acids Research, 19:3821-3835 (1999)) has 
recently reported genome-wide expression profiling of K coli at a single ORF 
level of resolution. Changes in RNA levels after exposure to heat shock or IPTG 
were analyzed using comprehensive low density blots of individual ORFs on a 
25 nylon matrix and comprehensive high density arrays of individual ORFs spotted 
on glass slides. The results of the two methods were compared. 

The methods recited above permit monitoring of the effect of 
environmental changes on gene expression by comparing expression levels of a 
limited number of genes. They, however, fail to monitor the comprehensive 
30 responses of a preponderance of individual genes in the genome of an organism in 
reliable, useful manner. 

The problem to be solved, therefore, is to provide a way to measure the 
comprehensive gene expression profile analysis of the organism. 

SUMMARY OF THE INVENTION 
35 The invention provides a method for identifying gene expression changes 
within a bacterial species comprising: 

(a) providing a comprehensive micro-array synthesized from DNA 
comprised in a bacterial species; 
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(b) generating a first set of labeled probes from bacterial RNA, the 
RNA isolated from the bacterial species of step (a); 

(c) hybridizing the first set of labeled probes of step (b) to the 
comprehensive micro-array of step (a), wherein hybridization 

5 results in a detectable signal generated from the labeled probe; 

(d) measuring the signal generated by the hybridization of the first set 
of labeled probe to the comprehensive micro-array of step (c); 

(e) subjecting the bacterial species of step (a) to a gene expression 
altering condition whereby the gene expression profile of the 

10 bacterial species is altered to produce a modified bacterial species ; 

(f) generating a second set of labeled probes from bacterial RNA, the 
RNA isolated from the modified bacterial species of step (e); 

(g) hybridizing the second set of labeled probes of step (f) to the 
comprehensive micro-array of step (a), wherein hybridization 

15 results in a detectable signal generated from the labeled probe; 

(h) measuring the signal generated by the hybridization of the second 
set of labeled probes to the comprehensive micro-array of step (g); 
and 

(i) comparing signal generated from the first hybridization to the 
20 signal generated from the second hybridization to identify gene 

expression changes within a bacterial species. 
Additionally the invention provides a method for identifying gene 
expression changes within a bacterial strain comprising: 

(a) providing a comprehensive micro-array synthesized from DNA 
25 comprised in a bacterial species ; 

(b) generating a first set of fluorescent cDNA from bacterial RNA, the 
RNA isolated from the bacterial species of step (a); 

(c) hybridizing the first set of fluorescent cDNA of step (b) to the 
comprehensive micro-array of step (a), wherein hybridization 

30 results in a detectable signal generated from the fluorescent cDNA; 

(d) measuring the signal generated by the hybridization of the first set 
of fluorescent cDNA to the comprehensive micro-array of step (c); 

(e) subjecting the bacterial species of step (a) to a gene expression 
altering condition whereby the gene expression profile of the 

35 bacterial species is altered to produce a modified bacterial species; 

(f) generating a second set of fluorescent cDNA from bacterial RNA, 
the RNA isolated from the modified bacterial species of step (e); 
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(g) hybridizing the second set of fluorescent cDNA of step (f) to the 
comprehensive micro-array of step (a), wherein hybridization 
results in a detectable signal generated from the fluorescent cDNA; 

(h) measuring the signal generated by the hybridization of the second 
5 set of fluorescent cDNA to the comprehensive micro-array of 

step (g); and 

(i) comparing signal generated from the first hybridization to the 
signal generated from the second hybridization to identify gene 
expression changes within a bacterial species . 

10 In an alternate embodiment the invention provides a method for 

identifying gene expression changes within a genome comprising: 

(a) providing a comprehensive micro-array synthesized from DNA 
comprised in a prokaryotic or eukaryotic speices; 

(b) generating a control set of fluorescent cDNA from total or 
15 polyadenylated RNA, the RNA isolated from the species of 

step (a), the fluorescent cDNA comprising at least one first 
fluorescent label and at least one different second fluorescent label; 

(c) mixing the control set of fluorescent cDNA labeled with the at least 
one first label with the control set of fluorescent cDNA labeled 

20 with the at least second first label to for a dual labeled control 

cDNA; 

(d) hybridizing the dual labeled control set of fluorescent cDNA of 
step (c) to the comprehensive micro-array of step (a), wherein 
hybridization results in a detectable signal generated from the 

25 fluorescent cDNA; 

(e) measuring the signal generated by the hybridization of the dual 
labeled control set of fluorescent cDNA to the comprehensive 
micro-array of step (c); 

(f) subjecting the prokaryote or eukaryote of step (a) to a gene 

30 expression altering condition whereby the gene expression profile 

of the prokaryote or eukaryote is altered to produce a modified 
prokaryote or eukaryote ; 

(g) generating an experimental set of fluorescent cDN A from total or 
polyadenylated RNA, the RNA isolated from the modified 

35 prokaryote or eukaryote of step (e), the fluorescent cDNA 

comprising the first fluorescent label and the different second 
fluorescent label to step (b); 
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(h) mixing the experimental set of fluorescent cDNA labeled with the 
at least one first label with the experimental set of fluorescent 
cDNA labeled with the at least second first label to form a dual 
labeled experimental cDNA; 
,5 (i) hybridizing the experimental set of fluorescent cDNA of step (h) to 

the comprehensive micro-array of step (a), wherein hybridization 
results in a detectable signal generated from the fluorescent cDNA; 

(j) measuring the signal generated by the hybridization of the second 
set of fluorescent cDNA to the comprehensive micro-array of 
10 step (g); and 

(k) comparing signal generated from the dual labeled control 

hybridization with the dual labeled experimental hybridization to 
identify gene expression changes within a prokaryotic or 
eukaryotic species. 

15 In another embodiment the invention provides a method for quantitating the 
amount of protein specifying RNA contained within a genome comprising: 
(a) providing a comprehensive micro-array comprising a multiplicity 
of genes synthesized from genomic DNA comprised in a 
prokaryotic or eukaryotic organism; 
20 (b) generating a set of fluorescent cDNA from total or poly-adenylated 

RNA isolated from the prokaryotic or eukaryotic organism of 
step (a); 

(c) generating a set of fluorescent DNA from genomic DNA isolated 
from the prokaryotic or eukaryotic organism of step (a); 
25 (d) hybridizing the fluorescent cDNA of step (b) to the comprehensive 

micro-array of step (a), wherein hybridization results in a first 
fluorescent signal generated from the fluorescent cDNA for each 
gene; 

(e) hybridizing the fluorescent DNA of step (c) to the comprehensive 
30 micro-array of step (a), wherein hybridization results in a second 

fluorescent signal generated from the fluorescent DNA for each 
gene; and 

(f) dividing, for each open reading from, the first fluorescent signal 

t into the second fluorescent signal to provide a quantitated measure 

35 of the amount of protein specifying RNA for each gene. 

The methods of the present invention are applicable to genomes contained 
within a variety of organisms including bacteria, cyanobacteria, yeasts, 
filamentous fungi, plant cells and animal cells. 
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The present methods of identifying gene expression changes within 
genome may be additionally coupled with the methods of quantitating the amount 
of protein specifying RNA contained within a genome as disclosed herein. 
BRIEF DESCRIPTION OF THE DRAWINGS AND 
5 SEQUENCE DESCRIPTIONS. 

Figure 1 A describes the gene expression analysis of IPTG induction in a 
single hybridization experiment using different slide sets as capture reagents for 
Cy3-labeled cDNA derived from treated and control cells and plotted in log-log 
form. 

10 Figure 1 B describes the gene expression analysis of IPTG induction by 

labeling the control sample with Cy5 and the induced sample with Cy3 before 
hybridizing to a single set of 3 slides. 

Figure 1C describes an average of induced RNA and control RNA with 
Cy3 from IPTG induction, generated by label swapping. 
15 Figure ID describes data replicating the results shown in Figure 1C. 

Figure IE describes an averaging of the data of Figure 1C and Figure ID. 
Figure 2 describes the distribution of gene expression levels for cells 
grown in minimal or rich medium. 

Figure 3 describes the fractional (summed open reading frame 
20 transcripts/total open reading frame transcripts) analysis of gene expression. 

The invention can be more fully understood from the following detailed 
description and the accompanying sequence descriptions which form a part of this 
application. 

The following sequences comply with 37 C.F.R. 1.821-1.825 
25 ("Requirements for Patent Applications Containing Nucleotide Sequences and/or 
Amino Acid Sequence Disclosures - the Sequence Rules") and are consistent with 
World Intellectual Property Organization (WIPO) Standard ST.25 (1998) and the 
sequence listing requirements of the EPO and PCT (Rules 5.2 and 49.5(a-bis), and 
Section 208 and Annex C of the Administrative Instructions). The symbols and 
30 format used for nucleotide and amino acid sequence data comply with the rules set 
forth in 37 C.F.R. §1.822. 

SEQ ID NO:l and 2 are primers used in the amplification of the sdiA gene. 

DETAILED DESCRIPTION OF THE INVENTION 
Applicants have solved the stated problem by providing a method to 
35 measure a comprehensive mRNA expression of E. coli using a high density DNA 
microarray with a near-complete collection of E coli open reading frames (ORFs). 
The present invention advances the art by providing: 

6 



WO 01/29261 



PCTAJS00/28352 



(i) the first instance of a comprehensive micro-array comprising greater 
than 75% of all open reading frames from a prokaryotic organism, overcoming the 
problems of high concentration of endogenous RNAase and ribosomal RNA; 

(ii) a method for quantitating the amount of each protein specifying RNA 
5 contained within a culture; and 

(iii) a method for decreasing the background noise generated within a gene 
expression profile through the combination of multiple signal generating labels. 

The present invention has utility in many different fields. Many discovery 
compounds can be screened by comparing their gene expression profile to a 
10 known compound that affects the desirable target gene products. Additionally 
gene expression profiles are good indicators of genotypic alterations among 
strains. The present invention may allow the discovery of complementary target 
inhibitors in combination drug-therapy and may be used as a modeling system to 
test perturbations in process conditions to determine the conditions for the high 
15 yield of desired production in various bio-processes and biotransformations. 

In this disclosure, a number of terms and abbreviations are used. The 
following definitions are provided. 

"Open reading frame" is abbreviated ORF. The term "ORF" is refers to a 
gene that specifies a protein. 
20 "Polymerase chain reaction" is abbreviated PCR. 

The term "micro-array" means an array of regions having a density of 
discrete regions of oligonucleotides of at least about 100/cm 2 , and preferably at 
least about 1000/cm 2 . 

The term "comprehensive micro array" refers to high-density micro-array 
25 containing at least 75% of all open reading frames of the organism. 

The term "expression profile" refers to the expression of groups of genes. 

The term "gene expression profile" refers to the expression of an 
individual gene and of suites of individual genes. 

The "comprehensive expression profile" refers to the gene expression 
30 profile of more than 75% of all genes in the genome. 

The term "high density" as used in conjunction with micro-array means 
and array having an array density of generally greater than about 60, more 
generally greater than about 100, most generally greater than about 600, often 
greater than about 1000, more often greater than about 5,000, most often greater 
35 than about 10,000, preferably greater than about 40,000 more preferably greater 
than about 100,000, and most preferably greater than about 400,000 different 
nucleic acids per cm. 2 
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As used herein, an "isolated nucleic acid fragment" is a polymer of RNA 
or DNA that is single- or double-stranded, optionally containing synthetic, non- 
natural or altered nucleotide bases. An isolated nucleic acid fragment in the form 
of a polymer of DNA may be comprised of one or more segments of cDNA, 

5 genomic DNA or synthetic DNA. 

The term "probe" refers to a single-stranded nucleic acid molecule that can 
base pair with a complementary single stranded target nucleic acid to form a 
double-stranded molecule. 

The term "genotype" refers to the genetic constitution of an organism as 

10 distinguished from its physical appearance. 

The term "genomic DNA" refers to total DNA from an organism. 
The term "total RNA" refers to non-fractionated RNA from an organism. 
The term "protein specifying RNA" or "protein specifying transcript" or 
"mRNA" refers to RNA derived from ORF. 

15 The term "label" will refer to a substance which may be incorporated into 

DNA or RNA which will emit a detectable signal under various conditions. 
Typically a label will be a fluorescent moiety. 

A nucleic acid molecule is "hybridizable" to another nucleic acid 
molecule, such as a cDNA, genomic DNA, or RNA, when a single stranded form 

20 of the nucleic acid molecule can anneal to the other nucleic acid molecule under 
the appropriate conditions of temperature and solution ionic strength. 
Hybridization and washing conditions are well known and exemplified in 
Sambrook, J., Fritsch, E. F. and Maniatis, T. Molecular Cl oning: A Laboratory 
Manual Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring 

25 Harbor (1989), particularly Chapter 11 and Table 11.1 therein. The conditions of 
temperature and ionic strength determine the "stringency" of the hybridization. 
Hybridization requires that the two nucleic acids contain complementary 
sequences, although depending on the stringency of the hybridization, mismatches 
between bases are possible. The appropriate stringency for hybridizing nucleic 

30 acids depends on the length of the nucleic acids and the degree of 

complementation, variables well known in the art. The greater the degree of 
similarity or homology between two nucleotide sequences, the greater the value of 
Tm for hybrids of nucleic acids having those sequences. The relative stability 
(corresponding to higher Tm) of nucleic acid hybridizations decreases in the 

35 following order: RNA:RNA, DNA:RNA, DNA:DNA. For hybrids of greater 
than 100 nucleotides in length, equations for calculating Tm have been derived 
(see Sambrook et al., supra, 9.50-9.51). For hybridizations with shorter nucleic 
acids, i.e., oligonucleotides, the position of mismatches becomes more important, 
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and the length of the oligonucleotide determines its specificity (see Sambrook 
et al., supra, 1 1.7-1 1.8). Furthermore, the skilled artisan will recognize that the 
temperature and wash solution salt concentration may be adjusted as necessary 
according to factors such as length of the probe. 
5 The term "complementary" is used to describe the relationship between 

nucleotide bases that are capable to hybridizing to one another. For example, with 
respect to DNA, adenosine is complementary to thymine and cytosine is 
complementary to guanine. 

"Gene" refers to the part of the genome specifying a macromolecular 
10 product be it RNA or a protein and include regulatory sequences preceding 
(5' non-coding sequences) and following (3* non-coding sequences) the coding 
sequence. 

A "genetic site" refers to a genomic region at which a gene product 
operates. 

15 "Coding sequence" or "open reading frame" (ORF) refers to a DNA 

sequence that codes for a specific amino acid sequence. "Suitable regulatory 
sequences" refer to nucleotide sequences located upstream (5 1 non-coding 
sequences), within, or downstream (3 1 non-coding sequences) of a coding 
sequence, and which influence the transcription, RNA processing or stability, or 

20 translation of the associated coding sequence. Regulatory sequences may include 
promoters, translation leader sequences, introns, and polyadenylation recognition 
sequences. / 

"Promoter" refers to a DNA sequence capable of controlling the 
expression of a coding sequence or functional RNA. In general, a coding 

25 sequence is located 3' to a promoter sequence. Promoters may be derived in their 
entirety from a native gene, or be composed of different elements derived from 
different promoters found in nature, or even comprise synthetic DNA segments. It 
; is understood by those skilled in the art that different promoters may direct the 
expression of a gene in different tissues or cell types, or at different stages of 

30 development, or in response to different environmental conditions. Promoters 

which cause a gene to be expressed in most cell types at most times are commonly 
referred to as "constitutive promoters". It is further recognized that since in most 
cases the exact boundaries of regulatory sequences have not been completely 
defined, DNA fragments of different lengths may have identical promoter activity. 

35 "RNA transcript" refers to the product resulting from RNA polymerase- 

catalyzed transcription of a DNA sequence. When the RNA transcript is the 
polymer product of an RNA polymerase, it is referred to as the primary transcript 
or it may be a RNA sequence derived from post-transcriptional processing of the 
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primary transcript and is referred to as the mature RNA. "Messenger RNA 
(mRNA)" refers to the RNA that is without introns and that can be translated into 
protein by the cell. "cDNA" refers to a double-stranded DNA that is 
complementary to and derived from mRNA. 
5 The term "expression", as used herein, refers to the transcription and stable 

accumulation of sense (mRNA) or antisense RNA derived from genomic DNA. 
Expression may also refer to translation of mRNA into a polypeptide. 

The term "stress" or "environmental stress" refers to the condition 
produced in a cell as the result of exposure to an environmental insult. 

10 The term "insult" or "environmental insult" refers to any substance or 

environmental change that results in an alteration of normal cellular metabolism in 
a bacterial cell or population of cells. Environmental insults may include, but are 
not limited to, chemicals, environmental pollutants, heavy metals, changes in 
temperature, changes in pH, as well as agents producing oxidative damage, DNA 

15 damage, anaerobiosis, and changes in nitrate availability or pathogenesis. 

The term "stress response" refers to the cellular response to an 
environmental insult. 

The term "stress gene" refers to any gene whose transcription is induced as 
a result of environmental stress or by the presence of an environmental insult. 

20 The term "modified bacterial species" refers to a bacterial culture that has 

been exposed to a stress or insult such that either it demonstrates a change in its 
gene expression profile. Typically the modified bacterial species is produced as 
the result of induction or challenge of the culture with a chemical or 
environmental challenge. Similarly, a "modified prokaryotic or eukaryotic 

25 ? species" refers to either a prokarytoic or eukaryotic organism that has been 

exposed to a stress or insult such that the gene expression profile of that organisms 
as been altered. 

The term "log phase", "log phase growth", "exponential phase" or 
"exponential phase growth" refers to cell cultures of organisms growing under 
30 conditions permitting the exponential multiplication of the cell number. 

The term "growth-altering environment" refers to energy, chemicals, or 
living things that have the capacity to either inhibit cell growth or kill cells. 
Inhibitory agents may include but are not limited to mutagens, antibiotics, UV 
light, gamma-rays, x-rays, extreme temperature, phage, macrophages, organic 
35 chemicals and inorganic chemicals. 

Standard recombinant DNA and molecular cloning techniques used here 
are well known in the art and are described by Sambrook, J., Fritsch, E. F. and 
Maniatis, T., Molecular Cloning: A Laboratory Manual, Second Edition, Cold 

10 
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Spring Harbor Laboratory Press, Cold Spring Harbor, NY (1989) (hereinafter 
"Maniatis"); and by Silhavy, T. J., Bennan, M L. and Enquist, L. W., 
Experiments with Gene Fusions . Cold Spring Harbor Laboratory Cold Press 
Spring Harbor, NY (1984); and by Ausubel, F. M. et al., Current Protocols in 
5 Molecular Biology, published by Greene Publishing Assoc. and Wiley- 
Interscience (1987). 

The present invention provides a method to measure the changes in gene 
expression profiles of prokaryotic organisms. The present invention also provides 
a method to measure the levels of protein specifying RNA in prokaryotic and/or 

10 eukaryotic organisms. The present invention provides a method to compare the 
gene expression patterns of two samples differing in one variable. The variables 
may include but are not limited to genotype, media, temperature, depletion or 
addition of nutrient, addition of an inhibitor, physical assault, irradiation, heat, 
cold, elevated or lowered pressure, desiccation, low or high ionic strength, and 

15 growth phases. 

Gene expression profiles were determined under the following conditions 
to find: (a) differences in gene expression profiles caused by growth of £ colt in 
either minimal or rich medium, (b) changes in gene expression associated with the 
transition from exponential phase to stationary phase growth in minimal medium, 

20 and (c) the specificity of induction mediated by isopropylthiogalactoside (IPTG), 
the classic lac operon inducer, (d) the specificity of expression changes mediated 
by the amplification of sdiA, a positive activator of an operon that includes 
ftsQAZ, genes essential for septation, and (e) the changes in gene expression 
patterns with cells that cannot turn on the SOS stress response in comparison to 

25 wild type response when the cells are exposed to mitomycin C (MMC). 

In its most basic form the present invention creates a comprehensive 
micro-array from a bacterial genome. Any bacteria is suitable for analysis by the 
method of the present invention where enteric bacteria {Escherichia, and 
Salmonella for example) as well as cyanobacteria (such as Rhodobacter and 

30 Synechocystis and Bacillus, Acinetobacter, Streptomyces, Methylobacter, and 
Pseudomona are particularly suitable. 

One of skill in the art will appreciate that in order to measure the 
transcription level (and thereby the expression level) of a gene or genes, it is 
desirable to provide a nucleic acid sample comprising mRNA transcript(s) of the 

35 gene or genes, or nucleic acids derived from the mRNA transcript(s). As used 

herein, a nucleic acid derived from an mRNA transcript refers to a nucleic acid for 
whose synthesis the mRNA transcript or a subsequence thereof has ultimately 
served as a template. Thus, a cDNA reverse transcribed from an mRNA, an RNA 

11 
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transcribed from that cDNA, a DNA amplified from the cDNA, an RNA 
transcribed from the amplified DNA, etc!, are all derived from the mRNA 
transcript and detection of such derived products is indicative of the presence 
and/or abundance of the original transcript in a sample. Thus, suitable samples 
5 include, but are not limited to, mRNA transcripts of the gene or genes, cDNA 
reverse transcribed from the mRNA, cRNA transcribed from the cDNA, DNA 
amplified from the genes, RNA transcribed from amplified DNA, and the like. 

Typically the genes are amplified by methods of primer directed 
amplification such as polymerase chain reaction (PCR) (U.S. Patent 

10 No. 4,683,202 (1987, Mullis, et al.) and U.S. Patent No. 4,683,195 (1986, Mullis, 
et al.), ligase chain reaction ( LCR) (Tabor et al., Proc. Acad Sci. U.S.A., 82, 
1074-1078 (1985)) or strand displacement amplification (Walker et al., Proc. Natl. 
Acad Sci. U.S.A., 89, 392, (1992) for example. 

The micro-array is comprehensive in that it incorporates at least 75% of all 

15 ORF's present in the genome. Amplified ORF's are then spotted on slides 

comprised of glass or some other solid substrate by methods well known in the art 
to form a micro-array. Methods of forming high density arrays of 
oligonucleotides, with a minimal number of synthetic steps are known (see for 
example Brown et al., U.S. Patent No. 6,1 10,426). The oligonucleotide analogue 

20 array can be synthesized on a solid substrate by a variety of methods, including, 
but not limited to, light-directed chemical coupling, and mechanically directed 
coupling. See Pirrung et al., U.S. Pat. No. 5,143,854 (see also PCT Application 
No. WO 90/15070) and Fodor et al., PCT Publication Nos. WO 92/10092 and 
WO 93/09668 which disclose methods of forming vast arrays of peptides, 

25 oligonucleotides and other molecules using, for example, light-directed synthesis 
techniques. See also, Fodor et al., Science, 25 1, 767-77 (1991). 

Bacteria typically contain from about 2000 to about 6000 ORF's per 
genome and the present method is suitable for genomes of this size where 
genomes of about 4000 ORF's are most suitable. The ORF's are arrayed in high 

30 density on at least one glass microscope slide. This is in contrast to a low density 
array where ORF's are arrayed on a membranous material such as nitrocellulose. 
The small surface area of the high density array (often less than about 1 0 cm 2 , 
preferably less than about 5 cm 2 more preferably less than about 2 cm 2 , and most 
preferably less than about 1.6 cm. 2 ) permits extremely uniform hybridization 

35 conditions (temperature regulation, salt content, etc.). 

Once all the genes of ORF's from the genome are amplified, isolated and 
arrayed, a set of probes, bearing a signal generating label are synthesized. Probes 
may be randomly generated or may be synthesized based on the sequence of 

12 
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specific open reading frames. Probes of the present invention are typically single 
stranded nucleic acid sequences which are complementary to the nucleic acid 
sequences to be detected. Probes are "hybridizable" to the ORFs, The probe 
length can vary from 5 bases to tens of thousands of bases, and will depend upon 

5 the specific test to be done. Typically a probe length of about 1 5 bases to about 
30 bases is suitable. Only part of the probe molecule need be complementary to 
the nucleic acid sequence to be detected. In addition, the complementarity 
between the probe and the target sequence need not be perfect. Hybridization 
does occur between imperfectly complementary molecules with the result that a 

10 certain fraction of the bases in the hybridized region are not paired with the proper 
complementary base. 

Signal generating labels that may be incorporated into the probes are well 
known in the art. For example labels may include but are not limited to 
fluorescent moieties, chemiluminescent moieties, particles, enzymes, radioactive 

15 tags, or light emitting moieties or molecules, where fluorescent moieties are 
preferred. Most preferred are fluorescent dyes capable of attaching to nucleic 
acids and emitting a fluorescent signal. A variety of dyes are known in the art 
such as fluorescein, Texas red, and rhodamine. Preferred in the present invention 
are the mono reactive dyes cy3 (146368-16-3) and cy5 (146368-14-1) both 

20 available commercially (i.e.Amersham Pharmacia Biotech, Arlington Heights, 
IL). Suitable dyes are discussed in U.S. Patent No. 5,814,454 hereby incorporated 
by reference. 

Labels may be incorporated by any of a number of means well known to 
those of skill in the art. However, in a preferred embodiment, the label is 

25 simultaneously incorporated during the amplification step in the preparation of the 
probe nucleic acids. Thus, for example, polymerase chain reaction (PCR) with 
labeled primers or labeled nucleotides will provide a labeled amplification 
product. In a preferred embodiment, reverse transcription or replication, using a 
labeled nucleotide (e.g. dye-labeled UTP and/or CTP) incorporates a label into the 

30 transcribed nucleic acids. 

Alternatively, a label may be added directly to the original nucleic acid 
sample (e.g., mRNA, polyA mRNA, cDNA, etc.) or to the amplification product 
after the synthesis is completed. Means of attaching labels to nucleic acids are 
well known to those of skill in the art and include, for example nick translation or 

35 end-labeling (e.g. with a labeled RNA) by kinasing of the nucleic acid and 
subsequent attachment (ligation) of a nucleic acid linker joining the sample 
nucleic acid to a label (e.g., a fluorophore). 
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Following incorporation of the label into the probe the probes are then 
hybridized to the micro-array using standard conditions where hybridization 
results in a double stranded nucleic acid, generating a detectable signal from the 
label at the site of capture reagent attachment to the surface. Typically the probe 
5 and array must be mixed with each other under conditions which will permit 
nucleic acid hybridization. This involves contacting the probe and array in the 
presence of an inorganic or organic salt under the proper concentration and 
temperature conditions. The probe and array nucleic acids must be in contact for a 
long enough time that any possible hybridization between the probe and sample 

10 nucleic acid may occur. The concentration of probe or array in the mixture will 
determine the time necessary for hybridization to occur. The higher the probe or 
array concentration the shorter the hybridization incubation time needed. 
Optionally a chaotropic agent may be added. The chaotropic agent stabilizes 
nucleic acids by inhibiting nuclease activity. Furthermore, the chaotropic agent 

15 allows sensitive and stringent hybridization of short oligonucleotide probes at 
room temperature [Van Ness and Chen (1991) Nucl Acids Res. 19:5143-5151]. 
Suitable chaotropic agents include guanidinium chloride, guanidinium 
thiocyanate, sodium thiocyanate, lithium tetrachloroacetate, sodium perchlorate, 
rubidium tetrachloroacetate, potassium iodide, and cesium trifluoroacetate, among 

20 others. Typically, the chaotropic agent will be present at a final concentration of 
about 3 M. If desired, one can add formamide to the hybridization mixture, 
typically 30-50% (v/v). 

Various hybridization solutions can be employed. Typically, these 
comprise from about 20 to 60% volume, preferably 30%, of a polar organic 

25 solvent. A common hybridization solution employs about 30-50% v/v 

formamide, about 0.15 to 1 M sodium chloride, about 0.05 to 0.1 M buffers, 
such as sodium citrate, Tris-HCl, PIPES or HEPES (pH range about 6-9), about 
0.05 to 0.2% detergent, such as sodium dodecylsulfate, or between 0.5-20 mM 
EDTA, FICOLL (Pharmacia Inc.) (about 300-500 kilodaltons), 

30 polyvinylpyrrolidone (about 250-500 kdal), and serum albumin. Also included 
in the typical hybridization solution will be unlabeled carrier nucleic acids from 
about 0.1 to 5 mg/mL, fragmented nucleic DNA, e.g., calf thymus or salmon 
sperm DNA, or yeast RNA, and optionally from about 0.5 to 2% wt./vol. 
glycine. Other additives may also be included, such as volume exclusion agents 

35 which include a variety of polar water-soluble or swellable agents, such as 
polyethylene glycol, anionic polymers such as poly aery late or 
polymethylacrylate, and anionic saccharidic polymers, such as dextran sulfate. 
Methods of optimizing hybridization conditions are well known to those of skill 
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in the art (see, e.g., Laboratory Techniques in Biochemistry and Molecular 
Biology, Vol. 24: Hybridization With Nucleic Acid Probes, P. Tijssen, ed. 
Elsevier, N.Y., (1993)) and Maniatis, supra. 

The basis of gene expression profiling via micro-array technology relies on 
5 comparing an organism under a variety of conditions that result in alteration of the 
genes expressed. Within the context of the present invention a single population 
of cells was exposed to a variety of stresses that resulted in the alteration of gene 
expression. Alternatively, the cellular environment may be kept constant and the 
genotype may be altered. Typical stresses that result in an alteration in gene 

10 expression profile will include, but is not limited to conditions altering the growth 
of a cell or strain, exposure to mutagens , antibiotics, UV light, gamma-rays, 
x-rays, phage, macrophages, organic chemicals, inorganic chemicals, 
environmental pollutants, heavy metals, changes in temperature, changes in pH, 
conditions producing oxidative damage, DNA damage, anaerobiosis, depletion or 

15 addition of nutrients, addition of a growth inhibitor, and desiccation. Non-stressed 
cells are used for generation of "control" arrays and stressed cells are used to 
generate an "experimental", "stressed" or "induced" arrays. 

In an alternate embodiment the present invention provides a method for 
quantitating the amount of each protein specifying RNA contained within an 

20 organism. This is often necessary in gene expression profile analysis because the 
quantity of transcript produced as well as its fold elevation is needed for 
quantitative analysis of the cell's physiological state. The method is applicable to 
both prokary otic and eukaryotic organisms including for example, cyanobacteria 
(such as Rhodobacter and Synechocystis) yeasts (such as Saccharomyces, 

25 Zygosaccharomyces, Kluyveromyces, Candida, Hansenula, Debaryomyces, 
Mucor, Pichia and Torulopsis), filamentous fungi (such as Aspergillus and 
Arthrobotrys), plant cells and animal cells. The method proceeds by generating a 
comprehensive micro-array as described above, from either total or 
poly-adenylated RNA, depending on the whether the organism is prokaryotic or 

30 eukaryotic. Following the generation of the array, a set of labeled DNA and a set 
of labeled cDNA are synthesized having complementarity to the ORFs of the 
array. The signals generated from the independent hybridization of either the 
labeled DNA or cDNA are used to quantitate the amount of protein specifying 
RNA contained within a genome. 

35 • In another embodiment the invention provides a method for gene 

expression profiling with a reduced signal to noise ratio. This is accomplished 
using a dual "label swapping" method and is again applicable to both prokaryotic 
and eukaryotic genomes. "Label swapping" refers to a system where a set of 
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probes or cDNA generated from control or experimental conditions are labeled 
with two different labels and mixed prior hybridization with the array. Two sets 
of control and experimental probes or cDNA's are generated. One of the control 
sets is labeled with a first label (i.e. cy3) and the other is labeled with a different 
5 second label (i.e. cy5). The two differently labeled sets are mixed and then 
hybridized with the array. The same process is repeated for the experimental 
conditions and the resulting control and experimental fluorescent signals are 
compared. This combination of signals provides (a) additional measure of each 
transcript level and (b) allows for the canceling of any bias associate with 

10 differential incorporation of fluorescently labeled nucleotide into cDNA or the 
hybridization of that cDNA. 

The preferred embodiments of the invention are discussed below. 
Bulk E. coli RNA was reverse transcribed to prepare hybridization probes. 
Despite the large amount of stable RNA (ribosomal and transfer RNAs) in the 

15 template, hybridization to protein-encoding genes was readily detected. 

As shown in Figure 1 with IPTG induction, conditions have been 
optimized to yield highly reliable data. In Figure 1, basal expression levels were 
plotted on the ordinate, induced levels on the abscissa. Panel A illustrates the 
results obtained when two Cy3-labeled probes were hybridized to duplicate whole 

20 genome array sets. Panel B represents an experiment in which the Cy5-labeled 
cDNA copy of control RNA and the Cy3-labeled copy of induced RNA were 
co-annealed to a single slide set. The RNAs used to generate the results in 
Panel B were each labeled with the other dye to allow a "reciprocal" 
hybridization. In Panel C, the resulting data were averaged with the data 

25 presented in Panel B to yield the scatter plot depicted in Panel C. A second 
independent set of RNA samples were isolated, their cDNAs labeled with both 
dyes and products hybridized in both possible combinations to generate the results 
depicted in Panel D. Panel E displays the averaged results of the two independent 
experiments depicted in Panels C and D. 

30 Reciprocal Labeling . When the results of a single hybridization experiment using 
different slide sets as capture reagents for Cy3-labeled cDNA derived from treated 
and control cells were plotted in log-log form, lacZYA induction above the 
background was detected (Figure 1 A); variation of other genes was also 
significant as indicated by the width of the points falling along the diagonal of this 

35 scatter plot. Improvements were observed by labeling the control sample with 

Cy5 and the induced sample with Cy3 before hybridizing to a single set of 3 slides 
(Figure IB). However, there was a skewing of the data away from the abscissa 
and towards the ordinate (y-axis; Cy5-labeled probe). Averaging of these results 
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with others obtained using reciprocal copying of the same RNA samples (induced 
RNA reverse transcribed with Cy5 and control RNA with Cy3) resulted in a 
decreased variation between the treated and control samples (Figure 1C). Such 
"label swapping" lessened the skewing and decreased the scatter. The experiment, 

5 depicted in Figure 1C, was replicated; fresh cultures were induced and nucleic 
acids processed to yield the data depicted in Figure ID. The experiments shown 
in Figures 1C and ID each represent four measurements of individual transcript 
abundance; this repetition and averaging yielded the tight constellation shown in 
Figure IE which combined the data of Figure 1C and ID. Nonetheless, the scatter 

10 plot resulting from an experiment using the optimized protocol (Figure IE) 

illustrated that measurements of gene expression were still subject to considerable 
variation when the signal was in the lowest part of the detectable range. 

The effect of 1 mM IPTG upon expression of the arrayed genes was 
investigated. Duplicate RNA preparations of the control and IPTG treated cells 

15 were each labeled with Cy3 and Cy5 by first strand cDNA synthesis. Averaging 
of measurements gave an optimal reliability of the data (Figure 1). Examination 
of the extent of hybridization to any individual gene revealed a wide dynamic 
range with more than a thousand fold variation in signal intensity between genes 
(see Figure 1). The expression of only 8 genes increased by a factor of more than 

20 2 after exposure to 1 mM IPTG for 1 5 min (Figure IE). These induced genes are 
listed in Table 1 . Two-fold or greater repression was not observed after this 
treatment The most highly induced RNAs corresponded to the lac operon 
structural genes. Examples of the induced genes are b0956, melA> uxaA and 
bl783. 

25 Signal Quantitation . The present invention was applied to monitor the effects of 
growth stage and medium on gene expression. For these embodiments, signal 
quantitation was important The percentage of RNA that programs protein 
synthesis has been determined under a wide variety of growth regimes (Bremer 
and Dennis, Escherichia coli and Salmonella: Cellular and Molecular Biology 

30 ASM Press: 922-937 (1996)). The fraction of those protein-specifying transcripts 
devoted to each arrayed gene was estimated. Hybridization signals arising from 
annealing of RNA-derived Cy3-labeled cDNA populations were quantitated by 
dividing by the signal generated using Cy3 fluorescent DNA arising from copying 
of sheared E. coli genomic DNA as a probe. The probe synthesized by copying 

35 genomic DNA was used to approximate equimolar transcription of the entire 
genome. This quantitation allowed calculation of mRNA inventories. Three 
RNA samples were measured. The samples were isolated from cells growing 
exponentially in rich medium, from cells growing exponentially in minimal 
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medium, and from cells in minimal medium transitioning from exponential to 
stationary phase. RNAs from certain central metabolic (gapA, ptsH) 9 defense 
(ahpC, cspQ 9 DNA metabolic {hns), surface structure {acpP, ompACFT, lpp\ 
translation (rplBCKLMPWX, rpmBCI, rpsACDHJNS, trmD,JusA, infC, tufAB\ 
5 transcription (rpoAB), and unassigned (b4243) genes (Riley and Labedan, 

Escherichia coli and Salmonella: Cellular and Molecular Biology, ASM press: 
21 18-2202 (1996)) were abundant (>0.1%, among the top 100 transcripts) in all 
three samples. 

The most highly transcribed genes in actively growing broth-cultured cells 

10 often encoded proteins involved in translation. In contrast, cultures at a similar 
growth stage in glucose minimal medium, expressed to a very high level several 
small molecule biosynthetic genes and the means to utilize glucose. Thus, an 
agreement between these molecular analyses and the accumulated understanding 
of E. coli physiology was observed {Escherichia coli and Salmonella: Cellular 

15 and Molecular Biology, ASM press). This agreement was underscored in the 
analysis of cells transitioning from the exponential growth phase; the elevated 
expression of several /poS-controlled genes corresponded to expectations 
{Escherichia coli and Salmonella: Cellular and Molecular Biology, ASM press). 
The genes, each representing between 0.0007% and 1% of the hybridizing 

20 signal, were expressed in LB grown cells. The distribution of genes as a function 
of expression level is plotted in Figure 2. Figure 3 depicts fractional expression as 
a function of summed genes with genes ranked by expression level. In Figure 2, 
the histogram plots the number of genes as a function of expression range. 
Diagonally striped, solid, and horizontally striped bars reflect distributions 

25 observed in RNAs derived from cells growing exponentially in minimal medium, 
cells transitioning to stationary phase in minimal medium, and cells growing 
exponentially in rich medium, respectively. In Figure 3, the fraction (summed 
open reading frame transcripts/total open reading frame transcripts) was plotted as 
a function of genes summed. The order in which genes were summed was based 

30 upon expression level with the most highly expressed gene summed first. 

Fewer genes were expressed in LB than in minimal medium (Figure 2); the 
fraction of rare transcripts appeared under-represented in LB medium (Figure 3). 
The fifty most highly expressed genes in broth-grown cells are listed in left-most 
columns of Table 2; twenty-six of these intensely transcribed genes encode 

35 proteins involved in translation while three encode chaperones. 

The broad distribution analyses (Figures 2 and 3) readily revealed the 
significant differences observed in expression of R coli when grown in defined 
and rich media. In minimal media many more genes were transcribed over a 
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somewhat broader range. The 50 genes most highly expressed in exponentially 
growing cells cultured in minimal medium with glucose as a carbon/energy source 
are listed in the middle columns of Table 2. Eight biosynthetic genes were highly 
expressed (Table 2). Notable among them were metE, encoding the aerobic 
5 methionine synthase, and ilvC, an isoleucine-valine biosynthetic gene subject to 
feed-forward transcriptional activation (Umbarger, H.E, Escherichia coli and 
Salmonella: Cellular and Molecular Biology, ASM Press (1996) ) by its 
substrates. Both the //vC-encoded enzyme (Petersen et al, Nucleic Acids Res. 
14:963 1-9651 (1986)) and me/is-encoded enzyme (Green, R. C, Escherichia coli 

10 and Salmonella: Cellular and Molecular Biology, ASM Press (1996)) are sluggish 
catalysts. The metE product accounts for about 5% of K coli protein when cells 
are cultured in minimal medium with glucose as a carbon/energy source 
(VanBogelen et aL, Escherichia coli and Salmonella: Cellular and Molecular 
Biology, ASM Press (1996)). Other highly expressed biosynthetic genes included 

15 folE and cysK; the folE product, GTP cyclohydrolase I catalyzes both cleavage of 
the 5-membered ring of guanine and the rearrangement of the ribose moiety of the 
substrate, GTP (Green et al., Escherichia coli and Salmonella: Cellular and 
Molecular Biology, ASM Press (1996)). cysK, encoding o-acetylserine(thiol)- 
lyase isozyme A, is responsible for more than 90% of sulfur fixation under aerobic 

20 conditions (Kredich, N. M., Molecular Biology, ASM press (1996)). Transcripts 
of the /?yrJ5/operon encoding aspartate transcarbamylase also were highly 
expressed during exponential growth in minimal medium relative to a broth- 
grown culture. This expression level is a characteristic signature of strain 
MG1655 whose aspartate transcarbamylase content is elevated more than 100 fold 

25 when grown in the absence of uracil due to an rph mutation that is polar on pyfE 
(Jensen, K. F„ J. BacterioL 181:3525-3535 (1993)). The other highly expressed 
transcripts, thrL and pheF t encoded, respectively, the threonine leader polypeptide 
(Landick et aL, Escherichia coli and Salmonella: Cellular and Molecular Biology, 
ASM Press (1996)) and the phenylalanine-inhibited first enzyme of the common 

30 aromatic pathway. The pheF product, one of three isozymes, is estimated to 
account for more than 80% of the activity catalyzing the first common step of 
aromatic amino acid synthesis (Pittard, A. J., Escherichia coli and Salmonella: 
Cellular and Molecular Biology, ASM Press (1996)). 

In this embodiment, expression of several genes catalyzing fueling 

35 reactions was also elevated. Unexpectedly, aceAB, encoding the glyoxylate shunt 
enzymes malate synthase and isocitrate lyase (Cronan and Laporte, Escherichia 
coli and Salmonella: Cellular and Molecular Biology, ASM Press (1996)), was 
highly expressed. Perhaps the TCA cycle functions in its branched state during 
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this phase of growth requiring the glyoxylate shunt for anapleurotic replenishment 
(Neidhardt et aL, Physiology of the Bacterial Cell: A Molecular Approach, 
Sinauer Associates, Inc. (1990)). As expected, ptsHI transcripts encoding 
phosphotransferase sugar transport common components (Postma et al. , 
5 Escherichia coli and Salmonella: Cellular and Molecular Biology, ASM Press 
(1996)) also accumulated to a very high titer in glucose-minimal medium. 

The present invention was applied to monitor the transcripts of cells 
transitioning from exponential to stationary phase in defined, minimal medium. 
During this transition, significant changes in gene expression were expected and 

10 observed. Expressed gene levels were from 0.0023 to 1.6%. A total of 

1030 genes, of which 1 10 have a defined role, did not appear to be expressed. In ' 
this embodiment, the 50 most highly expressed genes during this transition are 
listed in the rightmost columns of Table 2. Significantly, several rpoS-regulated 
genes (Hengge-Aronis, Escherichia coli and Salmonella: Cellular and Molecular 

15 Biology, ASM press, 1497-1512) including hdeA (11 fold), hdeB (8.9 fold), dps 
(4.4 fold), gadA (8.2 fold) and gadB (12 fold) (Castanie-Cornet et al., J. Bacteriol 
181:3525-3535 (1999)) as well as rpoS (2.6 fold) itself became quite highly 
expressed. Despite this remodeling of transcription, the overall patterns of gene 
number as a function of expression level (Figure 2) and fractional expression as a 

20 function of ranked gene (Figure 3) were not as distinct as might have been 
expected in comparison to the patterns observed for RNA extracted from 
exponentially growing cells. 

The observed expression patterns are summarized in Table 3 where gene 
products were grouped by metabolic function using an established classification 

25 scheme (Riley and Labedan, Escherichia coli and Salmonella: Cellular and 

Molecular Biology, ASM Press (1996)). Exponential growth in minimal medium 
elevated the amount of pyrimidine and amino acid biosynthetic transcripts. In 
contrast cofactor and purine transcripts did not appear to accumulate relative to 
growth in broth. Expression of glyoxylate shunt and miscellaneous glucose 

30 transcripts was also elevated in minimal medium; the seven-fold elevation of 

glyoxylate shunt transcripts exceeded the average of that observed for amino acid 
biosynthetic mRNAs. Expression of genes involved in sulfur fixation was also 
elevated during growth in minimal medium. 

The rapid growth observed in LB was reflected in the gene expression 

35 profile, as was the difference in carbon energy/source between glucose and amino 
acids. LB-grown cultures displayed elevated expression of genes specifying 
glucogenic enzymes and of genes whose products degrade small molecules. 
Expression of the ATP and proton motive force generating machinery, elevated by 
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a factor of about 2, paralleled increased ribosomal protein, aminoacayl-tRNA 
synthetase and foldase/usher expression. 

Changes observed upon entering the transitional period between 
exponential and stationary phase growth were less dramatic. Nonetheless, 
5 elevation of mRNAs specifying gluconogenic, glycolytic, and TCA cycle 
enzymes was observed as was an increase in transcripts-encoding enzymes 
responsible for metabolic pool interconversions and for the non-oxidative branch 
of the hexose monophosphate shunt. The cells also displayed an increased titer of 
foldase/usher-specifying and global regulatory function transcripts while 

10 transitioning between growth phases. 

The present invention was used to monitor the change in gene expression 
when cells overexpressd sdiA gene. The sdiA is a positive activator of an operon 
that includes ftsQAZ, genes essential for septation. 

RNA isolated from broth grown, exponential phase cultures harboring 

15 either a single copy (pUCl 9/RFM443) or many copies (pDEW140/RFM443) of 
sdiA were compared after conversion into fluorescently labeled cDNA by 
hybridization to individual genes arrayed on glass slides. 

Expression of about 9% of the E. coli genes was elevated in the strain 
containing the multicopy sdiA plasmid (Table 4). Transcripts of seven genes 

20 involved in cell division were raised 2. 1 to 1 1 fold by amplification of sdiA as 
were a large number (about 20) of genes involved in DNA replication, repair, and 
degradation. Transcript levels of eight genes whose products alter the 
susceptibility of E. coli to drugs were more highly expressed in the strain 
containing the gene amplification. This genetic configuration also resulted in 

25 elevated expression of several lipopolysaccharide biosynthetic genes (rfa) as well 
as open reading frames encoding membrane structural elements. 

Expression of several genes of unknown function was also elevated in 
response to the presence of multiple copies of sdiA. The genes whose transcripts 
were highly (>6 fold) elevated in response to the multicopy sdiA plasmid 

30 included: b0135 (6.4 fold, annotated as putative fimbrial-like protein gene), 

b0225 (6.4 fold, a gene apparently co-transcribed with dinJ since between them 
there is only a 3 base pair intergenic region), b0157 (1 1 fold, encoding a putative 
malate dehydrogenase), b0530 (also known as sfinA and predicted to specify a 
fimbrial like protein was elevated 6.5 fold), b0712 (encoding a putative 

35 carboxylase had a 6.4 fold increase in transcript content) and bl438 (1 1 fold 
elevation in expression). 

Around 3% of the E. coli genes were repressed in a strain harboring the 
sdiA plasmid relative to the control strain containing the vector (Table 5). The 
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genes involved in chemotaxis, mobility, and flagella biosynthesis were repressed 
dramatically. Genes for transport of certain carbohydrate substrates and cations 
(Fe^ and K + ), degradation of corresponding carbon compounds, as well as 
acetate fermentation were repressed. The presence of pDEW140, a pUC19 
5 derivative harboring sdiA, resulted in a 30-fold elevation in detectable sdiA 

transcript. Expression of sdiA was very low (0.0015 %, the 4212th most abundant 
transcript) in LB grown E. coli MG1655. The increased expression in the plasmid 
containing strain raised the transcript rank to about 300. 

Genes ddl, ftsQ, ftsA, ftsZ and IpxC are organized in the order mentioned 

10 above in the complex ftsZ containing operon, and the above genes are transcribed 
in the same direction starting with ddL Since the sdU-encoded positive activator 
drives transcription of a mRNA including ddl, ftsQ, ftsA, ftsZ, and IpxC, increased 
quantities of RNA hybridizing to these genes were expected. Amplification of 
sdiA due to its presence on a multicopy plasmid elevated expression of ddlftsQ, 

15 ftsA t ftsZ and IpxC 4.6, 8.8, 10, 1 1 and 3.5 fold, respectively, relative to the strain 
that harbored pUC19 (Table 4). 

In the immediate down stream of sdiA, there are yecF, followed by uvrY 
and uvrC gene, respectively. uvrY and uvrC genes are transribed in the same 
direction as sdiA and the >>ecF is transcribed in the opposite direction. 

20 Unexpectedly, amplification of sdiA elevated expression of two genes downstream 
of sdiA was observed, uvr Y expression was elevated 12 fold while uvrC 
transcription was increased by a factor of 9 (Table 4). These two genes were 
transcribed in the same direction as sdiA. The expression of yecF decreased only 
slightly. 

25 Amplification of sdiA caused the expression of 101 genes to fall by a 

factor of 2 or more. Among them, 44 were involved in motility and chemotaxis. 
Thirty four genes were down regulated more than five-fold by sdiA amplification. 
Of these, thirty were involved in chemotaxis or motility (cheW; 
flgB,QD t E,FiGM,I,J t K,L,M,N;fl^^ tar andtsr). The 

30 master regulator genes flhC and D controlling flagella operon expression were 
lowered by only 30-38 %. 

The swarming of strains having single or multiple copies of sdiA was 
examined by spotting four single colony isolates of each strain on semi-solid 
medium. Since almost all the genes involved in flagella biosynthesis, chemotaxis 

35 and motility were dramatically repressed in the sdiA overexpression strain, loss of 
mobility of the sdiA overexpression strain was predicted. Experiments were 
carried out to compare the mobility of the two strains. After 8 hr. at 37°C, the 
strain containing pUC19 had swarmed (diameter =32± 2.5 mm) while that 
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containing pDEW140 (sdiA + ) had not (diameter =3.2+0.4 mm). After 23 h the 
pUC19 containing strain had filled the petri plate while the strain carrying the 
sdiA amplification had significantly swarmed covering about one half of each 
plate. This partial phenotype could be explained by either (a) plasmid loss 
5 allowing swarming of a revertant (sdiA + haploid) population as ampicillin was 
exhausted from the medium or (b) sdiA amplification only partially compromising 
motility. To distinguish between these possibilities, the site of inoculation and the 
edge of the swarm after 23 hr were streaked for single colonies to an ampicillin 
containing LB agar plate. Massive sdiA+ plasmid loss from cells at the edge of 

10 the swarm was not observed suggesting that the motility defective phenotype was 
not an absolute one. 

If the role of sdiA is to stimulate gene expression required for septation, 
sdiA might coordinate expression of the yfeZ-containing operon with action at the 
origin of replication, or/C. The two genes immediately flanking oriC are mioC 

15 and gidA. mioC is followed by asnC and asnA, and gidA is followed by gidB, atpl 
and atpB. All of the genes except asnA are transcribed in the same direction. 
gidA and mioC were over-transcribed relative to the vector-containing control 
strain. mioC transcript content was elevated 7 fold while those of the gidA and 
gidB genes were elevated 4 and 2 fold, respectively. This effect was most 

20 localized; adjoining genes were not over-expressed. 

Having found enhanced action around or/C, it was reasonable to examine 
the transcript content of genes surrounding the termini of replication when sdiA 
was amplified. There are multiple termini in K coll The region surrounding terB 
spans minutes 35.3-37.3 (Berlyn et al., Escherichia coli and Salmonella: Cellular 

25 and Molecular Biology ASM Press: 922-937 ( 1 996)) sdiA amplification-elevated 
expression of 12 of the 88 genes in this region more than 3 fold. Transcripts from 
another 26 genes in the region were elevated by a factor of 1 .5 to 3. Unlike the 
action observed around the terminus, the stimulation seen in the vicinity of terB 
was diffuse. Interestingly, tau, encoding the terminus-utilizing factor, was not 

30 over-expressed. Transcription of gusR, located at 36.5 minutes, was elevated 
8 fold by sdiA amplification (Table 4). 

acr genes specify sensitivity to acriflavines, molecules that intercalate into 
double stranded DNA containing monotonic runs of base pairs. Most acr mutants 
display a defect in acridine efflux; moreover they are often pleiotropic being 

35 hypersensitive to a wide variety of chemicals. Thus hyper-expression of these 
genes in a strain harboring an .wfc4-bearing multicopy plasmid could lead to 
mitomycin C expulsion and the observed resistance to this DNA damaging agent. 
This expectation of acr hyper-expression was confirmed. Evidence for elevated 
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expression of each acr operon was found as indicated by the fold expression 
reported in Table 4. 

Elevated transcription of the gal operon genes at minute 17 was observed 
in the strain bearing the sdiA amplification. These genes, moderately expressed 
5 when strain MG 1 655 was grown in LB medium (ranks: galE 84 1 , galT 1512, 
galK 599; Wei and LaRossa, unpublished), were elevated 3.8, 4.9 and 4.1 fold, 
respectively. Nearby, at minute 16 is the ybgUKL-nei region, ybg genes are 
organized as ybg F, ybgJ, ybgK and ybgL, in that order followed by nei gene. 
These genes, transcribed in the same orientation, could constitute an operon since 

10 the open reading frames are densely packed, at times overlapping. sdiA 

amplification elevated expression of these genes 5.2, 4.7, 6.4, 3.8 and 8.6 fold, 
respectively, nei encodes an endonuclease responsible for the excision of 
oxidized pyrimidines in the double helix. 

Two linked genes at minute 44, bl956 and bl957 were elevated 6.6 and 

15 14 fold by sdiA amplification. Similarly, expression of b201 7 and b2016 9 two 
genes at minute 45 divergently transcribed from and adjacent to the his operon, 
was elevated 3.8 and 3.5 fold, respectively by the presence of the .wfi/4-containing 
multicopy plasmid. 

Mitomycin C (MMC) is a DNA damaging agent. E. coli strain, MG1655, 

20 was exposed to MMC, and gene expressions were compared in cells that were 
harvested at 15 and 40 min post exposure. In the cells that were harvested at 
1 5 min, very little SOS response was detected. At the 40 min, expression 
of40 genes was elevated greater than 2 fold relative to the control strain. Among 
the 40, 1 3 stress response genes were induced (Table 6) more than 2 fold. The 

25 SOS genes that were induced by a 40 min exposure to MMC were recN, dinl 
sulA, lexA, recA, uvrA, dinD.priC, umuC, mioQ uvrB, ruvA, au&xseA. 

The SOS responsive genes are /ex/l-dependent. In order to determine the 
gene expression patterns in the presence and the absence of the SOS response, 
DM800 and DM803 were exposed to MMC for 40 min and the gene expression 

30 profiles were compared. DM800 and DM803 harbor lexA+ and lexA ind alleles, 
respectively. As expected, when exposed to MMC for 40 min, SOS responsive 
genes were induced greater than 2 fold in DM800 strain. SOS responsive genes, 
including lexA, were not induced in the DM803 strain (Tables 7 and 8). Many 
genes that were not induced by MMC in DM800 were induced by the DNA 

35 damaging agent in DM803. For examples, the expression of the following genes 
were induced greater than 2 fold in DM803 but not in DM800 (Tables 7 and 8): 
among the induced genes are those involved with cell division (i.e., dicB t dicC, 
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and sdiA); chemotaxis and mobility (i.e., cheW and motA); and the transport of 
small molecules (i.e., cycA, fadL, chaQ codB and btuQ. 

The present invention is not limited to only highly expressed genes for 
several reasons. First, reproducible expression measurements were obtained over 
5 a wide dynamic range (Figure IE). Second, the data of Figure 3 and Table 1 

illustrate that the lac operon expression, although low before IPTG induction, was 
detected suggesting that most transcripts can be readily measured with the 
described techniques. Analyses of well-characterized "promoter-down" mutants 
or spiking experiments may be useful in defining the lower limits of expression 
10 that can be observed. 

EXAMPLES 

The present invention is further defined in the following Examples. It 
should be understood that these Examples, while indicating preferred 
embodiments of the invention, are given by way of illustration only. From the 

15 above discussion and these Examples, one skilled in the art can ascertain the 
essential characteristics of this invention, and without departing from the spirit 
and scope thereof, can make various changes and modifications of the invention to 
adapt it to various usages and conditions. 
GENERAL METHODS 

20 Standard recombinant DNA and molecular cloning techniques used in the 

Examples are well known in the art and are described by Sambrook, J,, Fritsch, 

E. F. and Maniatis, T. Molecular Cloning: A Laboratory Manual; Cold Spring 
Harbor Laboratory Press: Cold Spring Harbor, (1989) (Maniatis) and by T. J. 
Silhavy, M. L. Bennan, and L. W. Enquist, Experiments with Gene Fusions, Cold 

25 Spring Harbor Laboratory, Cold Spring Harbor, N.Y. (1 984) and by Ausubel, 

F. M. et ah, Current Protocols in Molecular Biology, pub. by Greene Publishing 
Assoc. and Wiley-Interscience (1987), 

The meaning of abbreviations is as follows: "hr" means hour(s), "min" 
means minute(s), "sec" means second(s), "d" means day(s), "mL" means 
30 milliliters), "fiL" means microliters), "nL" means nanoliter(s), "fig" means 

microgram(s), "ng" means nanogram(s), "mM" means millimole(s), "^M" means 
micromole(s). 

Media and Culture Conditions: 

Materials and methods suitable for the maintenance and growth of 
35 bacterial cultures were found in Experiments in Molecular Genetics (Jeffrey H. 
Miller), Cold spring Harbor Laboratory Press (1972), Manual of Methods for 
General Bacteriology (Phillip Gerhardt, R.G.E. Murray, Ralph N. Costilow, 
Eugene W. Nester, Willis A. Wood, Noel R. Krieg and G. Briggs Phillips, eds), 
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pp. 210-213, American Society for Microbiology, Washington, DC. or Thomas D. 

Brock in Biotechnology: A Textbook of Industrial Microbiology. Second Edition 

(1989) Sinauer Associates, Inc., Sunderland MA. All reagents and materials used 

for the growth and maintenance of bacterial cells were obtained from Aldrich 
5 Chemicals (Milwaukee, WI), DIFCO Laboraoties (Detroit, MI), Gibco/BRL 

(Gaithersburg, MD), or Sigma Chemical Company (St. Louis, MO) unless 

otherwise specified. 

LB medium contains following per liter of medium: Bacto-tryptone (10 g), 

Bacto-yeast extract (5 g), and NaCl (1 0 g). 
10 Minimal M9 medium contains following per liter of medium: Na 2 HP0 4 

(6 g), KH 2 P0 4 (3 g), NaCl (0.5 g), and NH 4 C1 (1 g). 

Above media were autoclaved for sterilization then 10 mL of 0.01 M 

CaCl 2 and 1 mL of MgSC>4. 7H2O plus carbon source and other nutrient were 

added as mentioned in the examples. All additions were pre-sterilized before they 
15 were added to the media. 

Molecular Biology Techniques : 

Restriction enzyme digestions, ligations, transformations, and methods for 

agarose gel electrophoresis were performed as described in Sambrook, J., et aL, 

Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor 
20 Laboratory Press (1989). Polymerase Chain Reactions (PCR) techniques were 

found in White, B., PCR Protocols: Current Methods and Applications, Volume 

15(1993) Humana Press Inc. 

EXAMPLE 1 

Example 1 demonstrates genomic DNA amplification and the preparation 

25 of the high density DNA array. 

Amplification of 4290 R coli genes Specific primer pairs (available from Sigma 
Genosys Biotechnolgies, The Woodlands, TX) for each protein-specifying gene of 
E. coli were used in two consecutive PCR amplification reactions. Genomic DNA 
(30 ng) was used as the template in the first round of PCR amplification, and 

30 500-fold diluted PCR products served as templates for PCR re-amplification. 
Duplicate 50 |iL scale reactions were performed. The PCR reactions were 
catalyzed with ExTaq™ polymerase (Panvera, Madison, WI) with the four 
dNTPs (Pharmacia), present at 0.25 mM and the primers at 0.5 \iM. Twenty- 
five cycles of denaturation at 95°C for 30 sec, annealing at 64°C for 30 sec and 

35 polymerization at 72°C for 2 min were conducted. A 2 *iL aliquot of each PCR 
product was sized by electrophoresis through agarose gels. More than 95% of the 
second round PCR products displayed visible bands of the correct size. Second 
round PCR reactions devoid of templates and primers were saved to serve as 
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negative controls for hybridization capture reagents. One third of each second 
round PCR reaction was purified using 96-well PCR purification kits (Qiagen, 
Valencia, CA). The eluted DNAs were dried using a vacuum centrifuge. 
Arraying amplified genes . Twenty microliters of 6M Na 2 SCN or 50% DMSO 

5 was added to each dried DNA sample (> 0. 1 ng/nL). A generation II DNA spotter 
(Molecular Dynamics, Sunnyvale, CA) was used to array the samples onto coated 
glass slides (Amersham Pharmacia Biotech, Arlington Heights, IL). Aliquots of 
approximately 1 nL from 1536 resuspended PCR products were arrayed in 
duplicate on each slide; a set of three slides supported all amplified E. coli genes. 

10 To serve as controls, 76 specific K coli PCR products, 8 amplified genes of 
Klebsiella pnuemoniae and 12 plant cDNA clones were also spotted onto each 
slide. Spotted glass slides, after baking at 80°C for 2 hr., were stored under 
vacuum in a desiccator at room temperature. 

EXAMPLE 2 

15 Example 2 demonstrates gene expression analysis. E. coli mRNA was 

isolated, fluorescent labeled cDNA was prepared using mRNA as a template, and 
the labeled cDNA was hybridized to the high density DNA array. The amount of 
DNA hybridized to DNA array was quantitated and analyzed. 
Microbiological Methods . 

20 £. coli MG1655 was cultured with aeration in either the minimal medium, 

M9 (Miller, J. H., Experiments in Molecular Genetics, Cold Spring Harbor 
(1972)), supplemented with 0.4% glucose or in the rich medium, LB (Miller, J. H., 
Experiments in Molecular Genetics, Cold Spring Harbor (1972)), at 37°C. The 
overnight culture was diluted 250 fold into fresh medium arid aerated by shaking 

25 at 37°C. Samples of the minimal medium culture were harvested at Agoo^O^O 
(exponential phase) and 1 .6 (transition to stationary phase) prior to RN A isolation. 
An IPTG induction (Miller, J. H., Experiments in Molecular Genetics, Cold 
Spring Harbor (1972)) was performed to examine the specificity with which it 
effects gene expression. A culture grown overnight in LB at 37°C was diluted 

30 250 fold into fresh LB and aerated at 37°C. When the culture achieved an 

appropriate density (A 60 o=0»40), it was split. To one portion was added IPTG to a 
final concentration of I mM; the untreated sample served as a control. Incubation 
of both samples was continued with aeration at 37°C for another 15 min 
(A 600 =0.45 for both cultures) before RNA isolation was initiated. 

35 RNA Isolation . An equivalent volume of shaved ice was added to 50 mL samples 
which were pelleted immediately in a refrigerated centrifuge by spinning at 
10,410 x g for 2 min. Each resultant pellet was resuspended in a mixture 
containing 100 jiL of Tris HC1 (10 mM, pH 8.0) and 350 \xL of p-mercaptoethanol 
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supplemented RLT buffer [Qiagen RNeasy Mini Kit, Valencia, CA]. The cell 
suspension was added to a chilled 2 mL screwed-capped microfuge tube 
containing 100 pL of 0.1 mm zirconia/silica beads (Biospec Product Inc., 
Bartlesville, OK). The cells were broken by agitation at room temperature for 
5 25 sec with a Mini-Beadbeater (TM) (Biospec Products Inc., Bartlesville, OK). 
Debris was pelleted by centrifiigation for 3 min at 16,000 x g and 4°C; the 
resultant supernatant was mixed with 250 pL of ethanol. This mixture was loaded 
onto a column from the Qiagen RNeasy Mini Kit. RNA isolation was completed 
using the protocol supplied with this kit Incubation for 1 hr. at 37°C in 40 mM 

10 Tris pH 8.0, 10 mM NaCl, 6 mM MgCl 2 with RNase free RQ1 DNase (1 unit/pL, 
Promega, Madison, WI) digested any genomic DNA contaminating the RNA 
preparation. The digestion products were purified by a second passage through 
the RNeasy protocol (Qiagen, Valencia, CA). The product was eluted from the 
column in 50 pL RNAse-free water prior to determining sample concentration by 

15 an A26O wading. RNA preparations were stored frozen at either -20 or -80°C. 

Synthesis of fluorescent cDNA from total RNA , Six microgram of RNA template 
and 12 |ag of random hexamer primers (Operon Technologies, Inc., Alameda, CA) 
were diluted with double distilled (dd) water to a volume of 22 pL. Annealing 
was accomplished by incubation at 70°C for 1 0 min followed by 10 min at room 

20 temperature. In order were added: 8 pL of 5x Superscript II reaction buffer (Life 
Technologies, Inc., Gaithersberg, MD), 4 pL of 0.1M DTT, 2 pL of the dNTP mix 
(2 mM dATP, 2 mM dGTP, 2 mM TTP, 1 mM dCTP), 2 pL of 0.5 mM Cy3- or 
Cy5-dCTP (Amersham Pharmacia Biotech, Arlington Heights, IL), and 2 pL of 
Superscript II reverse transcriptase (200 units/mL, Life Technologies Inc., 

25 Gaithersberg, MD). DNA synthesis proceeded at 42°C for 2.5 hr. before the 

reaction was terminated by heating at 94°C for 5 min. Alkaline hydrolysis of the 
RNA templates was achieved by adding 2 pL of 5M NaOH followed by 
incubation at 37°C for 10 min. Hydrolysis was terminated by the sequential 
addition of 3 pL of 5M HC1 and 5 pL of 1M Tris-HCl, pH 6.8. The labeled 

30 cDNA was purified with a PCR purification kit (Qiagen, Valencia, CA), dried in a 
speed vacuum and stored at -20°C. Labeling efficiency was monitored using 
either A550, for Cy3 incorporation, or A$5o, for Cy5 labeling, to A 2 eo ratios. 
Fluorescent labeling of genomic DNA. Genomic DNA, isolated from strain 
MG1655 (Bachmann, B., Escherichia coli and Samonella: Cellular and 

35 Molecular Biology, ASM Press (1996)) by standard procedures (Van Dyk and 

Rosson, Methods in Molecular Biology: Bioluminescence Methods and Protocols, 
" Humana Press Inc. (1998)), was nebulized to approximately 2 kb pair fragments. 
Three microgram of this DNA was mixed with 6 pg of random hexamers primers 

28 



WO 01/29261 



PCT/US00/28352 



(Operon Technologies, Inc., Alameda, CA) in 33 \iL of dd water. DNA was 
denatured by heating at 94°C prior to annealing on ice for 10 min. Fluorescent 
copying of the genomic DNA was accomplished using the Klenow fragment of 
DNA polymerase I (5 jag/ jaL, Promega, Madison, WI). To the DNA mixture was 
5 added 6 of lOx Klenow buffer (supplied with the enzyme), 3 jxL of the dNTP 
mix described above, 12 \iL dd H2O, 3 fiL of 0.5 mM Cy3-dCTP (Amersham 
Pharmacia Biotech, Arlington Heights, IL), and 3 of the Klenow fragment of 
DNA polymerase I. After a static, 2.5 h incubation at room temperature, the 
labeled DNA probe was purified using a PCR purification kit (Qiagen, Valencia, 

10 CA) before drying in a speed vacuum. 

Hybridization and washing . Spotted slides were placed in isopropanol for 10 min, 
boiled in dd H2O for 5 min and dried by passage of ultra-clean N 2 gas prior to pre- 
hybridization. The prehybridization solution (PHS) was 3.5xSSC (BRL, Life 
Technologies Inc., Gaithersberg, MD), 0.2% SDS (BRL, Life Technologies Inc., 

15 Gaithersberg, MD), 1% bovine serum albumin (BSA, Fraction V, Sigma, St. 

Louis, MO). The hybridization solution (HS) contained 4 ^iL of dd water, 7.5 |iL 
of 20xSSC, 2.5 nL of 1% SDS (BRL, Life Technologies Inc., Gaithersberg, MD), 
1 \iL of 10 mg/ml Salmon sperm DNA (Sigma, St. Louis, MO) and 15 \ih of 
formamide (Sigma, St. Louis, MO). The slides were incubated at 60°C for 20 min 

20 in PHS. The slides were next rinsed 5 times in dd water at room temperature and 
twice in isopropanol before drying by the passage of nitrogen. The dried probe 
was resuspended in the HS and denatured by heating at 94°C for 5 min. 
Thirty microliter of the probe-containing HS was applied to a dried, 
pre-hybridized slide, covered with a cover slip (Corning, Corning, NY), and put 

25 into a sealed hybridization chamber containing a small reservoir of water to 
maintain moisture. Hybridization occurred for approximately 14 h at 35°C. 
Cover slips were removed in washing buffer I (WB I = 2xSSC, 0.1% SDS) 
warmed to 35°C prior to incubation for 5 min. Next, the slides were washed 
sequentially for 5 min in lxSSC, 0.1% SDS and O.lxSSC, 0.1% SDS. Slides were 

30 then passed through three baths, each passage lasting 2 min, in 0. 1 xSSC. The 
slides were dried with a nitrogen gas flow. 

Data Collection and Analysis . Hybridization to each slide was quantified with a 
confocal laser microscope (Molecular Dynamics, Sunnyvale, CA) whose 
photomultiplier tube was set to 700 volts and 800 volts for obtaining Cy3 and Cy5 
35 signals respectively. The images were analyzed with Array Vision 4.0 software 
(Imaging Research, Inc., Ontario, Canada). The fluorescent intensity associated 
with each spotted gene was reduced by subtracting the fluorescence of an 
adjoining, non-spotted region of the slide. These readings were exported to a 
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spreadsheet for further manipulation. The four "no DNA" spots derived from 
PCR reactions devoid of template were controls used to determine the noise 
(background signal) level. 

The 96 genes present on each slide were used as internal controls to 
5 quantify signal intensities yielding equivalent readings among the three slides of a 
whole genome array set. This corrected for slide-to-slide signal variation. 

For the IPTG induction experiment, it was presumed that the overall 
transcriptional pattern did not change significantly. Thus the summed equivalent 
reading for the entire genome was quantified; analogous quantitation of the 

10 underlying equivalent readings allowed calculation of fold induction of each 
gene's expression by comparison of such quantified equivalent readings. 
RNA abundance . To convert normalized equivalent readings into measures of 
transcript abundance, a further correction was needed. That correction required 
the hybridization signal arising from an equimolar concentration of all transcripts. 

15 The surrogate for this transcript pool was the fluorescent copy of genomic DNA. 
Thus, the fluorescent intensities from hybridization with RNA-derived probes 
were corrected using fluorescent intensities arising from genomic DNA derived 
probes. Specifically, the abundance of each gene's transcription produces) was 
determined by dividing the normalized equivalent reading of the genomic DNA 

20 derived sample into the normalized equivalent reading from the RNA derived 
sample. The convention of Riley (Riley and Labedan Escherichia coli and 
Salmonella: Cellular and Molecular Biology ASM Press \ 1996)) was followed in 
grouping genes into functional sets. 

EXAMPLE 3 

25 Example 3 demonstrates gene expression profile changes when cell were 

exposed to IPTG, or grown in different culture media. The results are illustrated 

in Tables 1,2 and 3 (Listing of Tables) as described above. 

IPTG Induction An E. coli strain MG1655 was grown overnight in LB at 37°C. 

The culture was diluted 250 fold into fresh LB and aerated at 37°C. When the 
30 culture achieved an appropriate density (Agoo^O-^O), it was split into two 

portions. 

To one portion, IPTG was added to a final concentration of 1 mM. The 
other portion was untreated and served as a control. 

Both samples was incubated with aeration at 37°C for another 15 min 
35 (Agoo^ 0 - 4 ^ for both cultures) before RNA isolation. Gene expression analysis 
was performed as described in Examples 1 and 2. 

Cells were grown in different culture media E. coli MG1655 was cultured 
with aeration overnight in either the minimal medium, M9, supplemented with 
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0.4% glucose or in the rich medium, LB at 37°C. The overnight culture was 
diluted 250 fold into fresh medium and aerated by shaking at 37°C. Samples of 
the minimal medium culture were harvested at Agoo^O.40 (exponential phase) and 
A6qo=1 6 (transition to stationary phase) prior to RNA isolation. The LB culture 
5 was harvested at A 6 qq-0A prior to RNA isolation. Gene expression analysis was 
performed as described in Examples 1 and 2. 

EXAMPLE 4 

Example 4 demonstrates gene expression changes and the effect on 
mobility when sdiA gene was overexpressed in E. colL The results are tabulated 
10 in Tables 4 and 5 (Listing of Tables) as described above. 

The following plasmids and strains were used in this example. 



strain or plasmid genotype 

MG1655 rph-1 

RFM443 rpsL galK2 lacA74 

pUC 1 9 Cloning vector 

pDEW140 pUC19 + sdiA (EcoRI) 

Strains and growth conditions 

15 Strains of MG1655 (Bachmann, B., Escherichia coli and Samonella: 

Cellular and Molecular Biology, ASM Press (1996)) and RFM443 (Menzel R., 
Anal Biochem., 181:40-50 (1989)) have been described. 

pDEW140 was constructed as following: Chromosomal DNA isolated 
from E. coli W3 1 10 was partially digested with restriction enzyme SauiAl and 

20 size fractionated on agarose gels. Fractions of two size ranges (average sizes of 
approximately 2.5 and 4.0 Kbp) were iigated to pBR322 (0.1 1 pmol) or pUCl 8 
(0. 1 1 pmol) that had previously been digested with restriction enzyme BamHl and 
treated with calf intestinal alkaline phosphatase. The molar ratio of chromosomal 
DNA to vector in each of the ligation reactions was approximately 0.2: 1 . The 

25 ligation products were used to transform ultracompetent £ coli XL2Blue 
(Stratagene) to ampicillin resistance. Pooled transformants (>10^ for each 
transformation) were used to isolate plasmid DNA. 

0.3 ng of the pUC18 library was electro-transformed into RFM443. The 
MMC resistant clones were selected on LB agar plates supplemented with 

30 100 [ig/mL of ampicillin and 6 mg/mL of MMC. Resistant colonies appeared 
after the incubation at 37°C. The colonies underwent single colony purification 
on the same medium. Plasmids derived from single colonies were isolated with 
the Qiagen 96-well turbo plasmid prep kit. These plasmids served as a template 
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for primer-directed DNA sequencing of the insert ends. One of the plasmids, 
Plasmid p[3+4/B10], was shown by sequencing to carry the sdiA and surrounding 
genes. From this plasmid sdiAwas amplified by PCR using the primers: 
f primer = TGGCA CGCAG GACAG AA (SEQIDNOrl) 
5 d primer- TAACA AATCA GCATA ACTCA T (SEQIDNO:2) 

The PCR used Ampli-Taq Gold. Conditions were 94°C, 1 1 min followed 
by 32 cycles of 94°C for 45 sec, 45°C for 45 sec, 72°C for 90 sec, the 72°C for 
7 min. 

The PCR product was blunt end ligated into EcoRW digested pT7Blue-3 

10 (Novagen). A clone having the proper sized fragment was obtained after 

transformation into DH5-alpha. From colonies, inserts of the proper size were 
detected by PCR-based analysis. Such colonies served as a source of plasmid 
DNA from which sdiA was liberated by digestion with EcoRI. The fragment was 
sized by electrophoresis through agarose gels and ligated into EcoRl digested 

15 pUC 1 9. The ligation mixture was used to tranform DHSalpha. Plasmid preps of 
the transformants were sequenced. One such plasmid containing sdiA was named 
pDEW140 and transformed into strain RFM443. 

Plasmids pUC19 and pDEW140 were transformed into RFM 443 selecting 
for ampicillin resistance on solidified LB agar medium. 

20 Strains of RFM443 (pUC19) and RFM443 (pDEW140) were grown 

overnight with aeration in LB with 150 jig/mL ampicillin (LB with amp). The 
overnight culture was diluted 250 fold into fresh medium (LB with amp) and 
incubated further at 37°C with shaking. Cells were collected at O.D.600=0.45, 
and total RNA was purified using Qiagen RNeasy mini. 

25 Motility experiment . 

A single colony was picked from freshly grown RFM443 (pUC19) or 
RFM443 (pDEW140) cultured on LB agar (1 .2%), and the center of a LB with 
amp soft agar (0.3%) plate was stabbed. The soft agar plate containing each 
culture was incubated at 37°C. The diameters of the growth zones of the two 

30 strains were measured and compared. 

EXAMPLE 5 

Example 5 demonstrates the differences in gene expression profile 
between strains proficient or deficient in their ability to respond to DNA 
damaging agents. An isogenic pair of strains, differing only in lexA, was used to 
35 investigate the cell's range of responses to the DNA damaging agent mitomycin C 
(MMC). the results are tabulated in Tables 6, 7, and 8 (Listing of Tables) as 
described above. 
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Strains E. coli strain, MG1655, was used to determine the gene expression profile 
of E. coli in response to a MMC challenge. Two isogenic E. coli strains (Mount 
et al t J. Bacteriol 1 12:886-893 (1972)), DM800 (lexA + ), used as control 
displaying a normal response to DNA damage, and DM803 (lexA ind ), a strain 
5 unable to mount the predominant "SOS" response to DNA damage, were 
compared using comprehensive gene expression profiling. 
MMC experiment MG1655 cells were grown in LB overnight with aeration. The 
overnight cultures were diluted 100 fold in LB to final volume of 500 mL and 
grown at 37°C to exponential phase. 200 mL of culture was treated with MMC to 

10 the final concentration of 250 ng/mL. Another 200 mL of culture were mock 

treated without MMC for comparison. Cells were harvested at 15 min and 40 min 
for MG1 655 strain. With DM800 and DM803 stains, cells, cultured in an 
identical manner, were harvested after 40 min exposure. RNA was isolated and 
gene expression profile was analyzed as shown in Examples 1 and 2. As seen in 

15 Tables 7 and 8, the lexA allele has a great influence on the response of cells to 
MMC. Table 8 shows that the strain deficient in SOS response still response to 
MMC but in different manner. 

EXAMPLE 6 

Preparation of a Synechocystis sp. PCC6803 cDNA Probes 

20 This example describes the construction of Synechocystis sp. PCC6803 

cDNA probes following growth of the cells in either minimal growth media 
(control) or minimal media plus UV-B light treatment. The prepared cDNA 
probes are used to determine gene expression patterns of many genes 
simultaneously on a Synechocystis sp. PCC6803 DNA microarray as described in 

25 Examples 7 and 8 below. 

Hybridization of Microarray Slides and Quantitation of Gene Expression 

Microarray glass slides were treated with isopropanol for 10 min, boiling 
double distilled water for 5 min, then treated with blocking buffer (3.5 x SSC, 
0.2% SDS, 1% BSA ) for 20 min at 60°C, rinsed five times with double distilled 

30 water, then twice with isopropanol, followed by drying under nitrogen. Cy3 
labeled cDNA probes prepared from the total RNA of the UV-B treated 
Synechocystis culture, mixed with an equal amount of Cy5 labeled cDNA probes 
prepared from the total RNA of the untreated Synechocystis culture, were applied 
to the glass slide in a total volume of 30 ^iL. The hybridization was repeated 

35 using Cy5 labeled cDNA probes prepared from total RNA of UV-B treated 

Synechocystis culture mixed with an equal amount of Cy3 labeled cDNA probes 
prepared from the total RNA of the untreated culture, and applied to a second 
glass slide in a total volume of 30 ^iL. The hybridization reactions on the glass 
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slides were performed for 16 hr at 42°C, in a humidified chamber. Hybridized 
slides were washed in IX SSC (0.15 M NaCl, 0.015 M sodium citrate), 0.1% SDS 
for 5 min at 42°C; 0.1X SSC, 0.1% SDS for 5 min at 42°C; three washes in 0.1X 
SSC for 2 min at room temperature; rinsed with double distilled water and 

5 isopropanol; and dried under nitrogen. The slides were scanned using a Molecular 
Dynamics laser scanner for imaging of Cy3 and Cy5 labeled cDNA probes. The 
images were analyzed using Array Vision Software (Molecular Dynamics, 
Imaging Research) to obtain fluorescence signal intensities of each spot (each 
ORF on the array) to quantitate gene expression. The ratio between the signals in 

10 the two channels (red:green) is calculated and the relative intensity of Cy5/Cy3 
probes for each spot represents the relative abundance of specific mKNAs in each 
sample. 

Svnechocvstis Strain and Culture Methods 

Briefly, Synechocystis sp. PCC6803 cells were grown at 30 pES^nr 2 light 

15 intensity in a minimal growth media, BG-1 1 (Catalog # C-3061, Sigma Chemical 
Co., St. Louis, MO) at 30°C, with shaking at 100 rpm with 5% C0 2 . 
Fifty milliliters of Synechocystis cells grown to mid logarithmic phase (OD 73 onm 
= 0.8 to 1.0) were divided into two 25 mL cultures and transferred from the 
Erlenmeyer growth flask to two 100 mL petri dishes. The petri dishes, with the 

20 lids on, were placed on a rotary shaker and shaken at 100 rpm. 
Cell Treatments 

For the control, the petri dishes comprising the Synechocystis cells were -\ 
placed on a rotary shaker with the lids on, and shaken at 100 rpm. For the UV-B 
treated group, the petri dishes comprising the Synechocystis cells were placed on a 

25 rotary shaker with the lids on, and shaken at 100 rpm. A UV-B lamp (302 nm,) 
was positioned above the petri dishes and the distance between the UV-B light 
source and the petri dishes was adjusted to give the desired level of UV-B light 
intensity. The level of UV-B light intensity was measured at the surface of the 
cell culture using a UV light meter, following the manufacturer's instructions. 

30 UV-B treatment was performed for either 20 min or 120 min. Following UV-B 
irradiation, the cells were immediately cooled on ice and their RNA isolated as 
described below. 

Total RNA Isolation and cDNA Probe Synthesis 

Control-treated Synechocystis cells and UV-B treated Synechocystis cells 
35 were cooled rapidly on ice and centrifuged at 4000 rpm for 5 min. Total RNA 
samples were isolated using Qiagen RNeasy Mini Kit (Qiagen), following the 
manufacturer's protocol. RNase A digestion was performed as described in the 
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protocol, and a second round purification was performed using the RNeasy Mini 
Kit. The purified total RNA was analyzed by agarose gel electrophoresis. 

From each total RNA preparation, both Cy3 and Cy5 florescent dye 
labeled cDNA probes were prepared. To synthesize the Cy3 or Cy5 labeled 
5 cDNA probes, a reverse transcription reaction was performed using 1 0 ^ig total 
RNA, 12 fig random hexamer (Ambion), 50 \xM of dATP, dGTP, dTTP, 25 »M 
of dCTP, and 15 nM Cy3-dCTP or 22 \iM Cy5-dCTP (Amersham Pharmacia 
Biotech), DTT, and AMV reverse transcriptase (Gibco BRL). The reaction was 
carried out at 42°C for 2.5 hr. After the labeling reaction, RNA templates were 

10 degraded by alkaline hydrolysis and the cDNA probes were purified using Qiagen 
PCR purification kit. The purified probes were quantitated by measuring the 
absorbance at 260 nm, 550 nm (Cy5 dye incorporation) and 650 nra (Cy3 dye 
incorporation). Prior to hybridization, 100-200 pmol of the purified Cy3 or Cy5 
labeled cDNA probes were dried under vacuum, and re-dissolved in the 

15 hybridization buffer (5x SSC, 50% formamide, 0.1% SDS, and 0.03 mg/mL 
salmon sperm DNA). 

EXAMPLE 7 

Analysis of Synechocystis sv. PCC6803 Gene Expression in Minimal Media 
Using a Synechocystis sp. PCC6803 DNA microarray prepared according 
20 to the methods described above and the cDNA probes prepared as described in 
Example 6, Applicants have identified herein promoters that can be employed for 
engineering high levels of gene expression in Synechocystis sp. PCC6803, other 
Synechocystis species, Synechococcus, and like organisms. This Example 
describes the identification of the most highly expressed genes and their 
25 corresponding strong promoters in Synechocystis sp. PCC6803 when grown in 
BG1 1 media containing 5 mM glucose as described above. 

Specifically, a DNA microarray was prepared according to the methods 
described above using DNA isolated from Synechocystis sp. PCC6803 cells 
grown in BG1 1 media containing 5 mM glucose. Minimal media Synechocystis 
30 sp. PCC6803 gene expression was determined by hybridizing this DNA 

microarray as described above with fluorescent cDNA probes synthesized from 
total RNA isolated from Synechocystis sp. PCC6803 cells grown in BG1 1 media 
containing 5mM glucose as described in Example 6. 

Briefly, for each minimal media experiment, two hybridization reactions 
35 were performed as described above. Specifically, the first reaction used equal 
molar (typically 100-200 pmol) of Cy5-labeled cDNA from total RNA of the 
minimal media treated sample, and Cy3-Iabeled cDNA probes synthesized from 
Synechocystis sp. PCC6803 genomic DNA; the second reaction used Cy3-labeled 
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cDNA from total RNA of the minimal media treated sample, and Cy5-labeled 
cDNA probes synthesized from Synechocystis sp. PCC6803 genomic DNA. The 
signal intensities were quantitated as described above. To calculate the ratio of 
fold induction (i.e., minimal media/genomic), the minimal media treated sample 
5, signal intensities were divided by the signal intensities of the genomic sample. As 
there were two sets of data from duplicated spotting within each slide, the total 
number of gene expression measurements for each gene was four. All four 
induction ratios for each gene were analyzed using an Excel program (Microsoft) 
to determine the standard deviation; an indicator of the level of confidence for the 

10 specific data set for each gene. The ratio of signal intensities represents a relative 
transcription level of each gene in the same experiment. Herein, Applicants have 
identified the most highly expressed genes, i.e. 9 those genes that are under the 
control of the strongest promoters, in Synechocystis under this minimal media 
condition (see Table 9). 

15 EXAMPLE 8 

Analysis of Synechocystis sp, PCC6803 Gene Expression Following UV-B 

Exposure 

Using a Synechocystis sp. PCC6803 DNA microarray prepared according 
to the methods described above and the probes prepared as described above in 

20 Example 6, Applicants have identified herein UV-B inducible promoters that can 
be employed for engineering high levels of gene expression in Synechocystis sp. 
PCC6803, other Synechocystis species, Synechococcus 9 and like organisms. This 
Example describes the identification of the most highly UV-B responsive genes in 
Synechocystis sp. PCC6803 when grown under minimal media conditions and 

25 exposed to 20 minutes of UV-B irradiation at 20 ^ES^nr 2 intensity. These UV 
inducible promoters can be used to control expression of certain proteins that may be 
toxic to Synechocystis cells. 

Specifically, a DNA microarray was prepared according to the methods 
described above using DNA isolated from Synechocystis sp. PCC6803. For each 

30 UV-B treatment experiment, two hybridization reactions were performed as 
described above. In particular, the first reaction used equal molar (typically 
100-200 pmol) of Cy5-labeled cDNA from total RNA of the UV-B treated 
sample, and Cy3-labeled cDNA from total RNA of the control sample 
(Synechocystis sp. PCC6803 grown in BG1 1 media containing 5 mM glucose); 

35 the second reaction used Cy3-labeled cDNA from total RNA of the UV-B treated 
sample, and Cy5-Iabeled cDNA from total RNA of the control sample. The signal 
intensities were quantitated as described above. To calculate the ratio of fold 
induction (i.e., UV-B/control), the UV-B treated sample signal intensities were 
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divided by the signal intensities of the control sample. As there were two sets of 
data from duplicated spotting within each slide, the total number of gene 
expression measurements for each gene was four. All four induction ratios for 
each gene were analyzed using an Excel program (Microsoft) to determine the 
5 standard deviation; an indicator of the level of confidence for the specific data set 
for each gene. 

Applicants have identified herein the most highly UV-B induced genes in 
Synechocystis following UV-B treatment (see Table 10). Only genes whose 
expression was induced more than 4 folds by UV-B light (20 min at 20 nES^nr 2 

10 intensity) as compared to the minimal media control are listed in Table 10. The 
promoters of these genes can be used to construct UV inducible expression 
vectors in Synechocystis. 

Some of the gene families induced by UV-B light include Dl protein 
(psbA), phycobilisome degradation proteins (nbLA, nblB), carotenoid biosynthesis 

15 enzymes (crtD, crtD, crtQ), chaperones (clpB, ctpA, dnaj, dnaK, htpG, hspl7), 
RNA polymerase sigma factor (rpoD), superoxide dismutase (sodB), high light 
inducible protein (hliA), FtsH protease, which is responsible for the degradation 
of photo-damaged Dl protein (ftsH), and DNA repair enzyme (uvrC). Among the 
group of UV inducible genes, there are several genes of unknown function: 

20 ssr201 6, and sllOl 85. Applicants* discovery has lead to the first level of 

functional assignment for these genes. The promoters of these genes can be used 
to construct UV inducible expression vectors in Synechocystis. 

A subgroup of Applicants' identified UV-B induced genes comprise two 
Escherichia coliAike -35 promoter sequences in the 5' upstream untranslated 

25 regions (UTR), including slrl604 (ftsH), slr0228 (ftsH), sll!867 (psbA3), slrl31 1 
(psbA2), ssl0452 (nblA), ssl0453 (nblA), ssl2542 (hliA), ssr2016 (unknown 
protein with homologues in green algae and plant), and sll0185 (unknown 
protein). The nucleotide sequence "GTTAC A" is present in the 5' untranslated 
regions of psbA2, psbA3, and ssr2016 nucleic acids. The nucleotide sequence 

30 "TTTACA" was also found to be present in the 5' UTR regions of psbA2, psbA3, 
ssr2016, rpoD, and ndhD2 nucleic acids. 
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5 a = fraction of particular transcript/summed transcripts hybridizing to all open reading frames on the micro-arrays; 
D = genes are ranked in order of expression with 1 being the most highly expressed gene 
MM: Minimal media, 
exp phase: exponential growth phase 
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Table 2. Highly Expressed Genes under Three Different Culture Conditions 
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a fraction of transcripts hybridizing to specified gene/summed transcripts hybridizing to all open 
reading frames on the micro-arrays 

bold, double underlined -foldase/usher genes; bold, underlined - stress responsive genes; bold - 
central metabolic enzyme-specifying genes; double underlined -biosynthetic genes; dp^.un^rijned - 
translation-associated genes; underlined -rpoS controlled genes 



Table 3. Summary of three E. coli Expression Profiles 



1. Cell processes 
Cell division- 26 c 
Chemotaxis. motility 

Chemotaxis and mobility- 12 
Folding and ushering proteins -7 
Transport of large molecules 

Protein, peptide secretion-32 
Transport of small molecules 

Amino acids, amines-49 

Anions-20 



fraction in fraction in fraction in 

MM a /exp. b MM/ LB/exp. 

phase transition phase 

phase 



0.011 

0.0014 
0.0032 

0.0082 

0.0091 
0.0029 



0.010 

0.00068 
0.0061 

0.01014 

0.0081 
0.0028 



0.010 

0.0011 
0.01 1 

0.010 

0.0068 
0.0023 
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Carbohydrates, organic acids, 

alcohols-82 
Cations-52 

Nucleosides, purines, pyrimidines-6 
Other- 12 

2. Elements of external origin: 

Laterally acquired elements 
Colicin-related 
functions-5 

Phage-related functions and prophages-27 
Plasmid-related functions- 1 
Transposon-related functions-34 

3. Global functions 

Energy transfer. ATP-proton motive force- 9 
Global regulatory functions -5 1 

4. Macromolecule metabolism 

Basic proteins 

Basic proteins - synthesis, modification-6 
Macromolecule degradation 

Degradation of DNA-23 

Degradation of RNA- 1 1 

Degradation of polysaccharides-3 

Degradation of proteins, peptides, glyco-61 
Macromolecule synthesis, modification 

DNA - replication, repair, restr./modific , n-89 

Lipoprotein- 11 

Phospholipids- 11 

polysaccharides - (cytoplasmic)-6 

proteins - translation and modification-34 

RNA synthesis, modification, DNA transcript'n-27 
Macromolecules 

Glycoprotein 

Lipopolysaccharide-13 
aa-tRNAs 

Amino acyl tRNA syn; tRNA modific'n-40 

5. Metabolism of small 
molecules 

Amino acid biosynthesis 
Biosynthesis of cofactors. 
carriers 

Central intermediary metabolism 

2-Deoxyribonucleotide metabolism- 1 2 
Amino sugars- 10 
Entner-Douderoff-3 
Gluconeogenesis-4 
Glyoxylate bypass-5 
Misc. glucose metabolism-3 
Non-oxidative branch, pentose pwy-8 
Nucleotide hydrolysis-2 
Nucleotide interconversions-13 
Phosphorus compounds- 17 
Polyamine biosynthesis-8 
Salvage of nucleosides and nucleotides- 18 
Sugar-nucleotide biosynthesis, conversions- 18 

40 
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Sulfur metabolism- 10 

Pool, multipurpose conversions of intermed. Met 1 - 

46 

Degradation of small molecules 
Arnines-9 
Amino acids- 17 
Carbon compounds-90 

Fatty acids- 10 

Other-8 
Energy metabolism, carbon 
Aerobic respiration-27 
Anaerobic respiration-80 
Electron transport-24 
Fermentation-21 
Glycolysis- 18 

Oxidative branch, pentose pwy-2 

Pyruvate dehydrogenase-6 

TCA cycle- 18 
Fatty acid biosynthesis 

Fatty acid and phosphatidic acid biosynthesis-23 
Nucleotide synthesis 

Purine ribonucleotide 
biosynthesis-22 

Pyrimidine ribonucleotide 
biosynthesis- 10 

6. Miscellaneous 

Not classified - 109 

7. Open reading frames 
Unknown proteins - 1 324 

8. Processes 

Adaptation 

Adaptations, atypical conditions- 16 

Osmotic adaptation* 1 4 
Protection responses 

Cell killing-3 

Detoxification- 11 

Drug/analog sensitivity-32 

9. Structural elements 

Ceil envelope 

Inner membrane-4 

Murein sacculus, peptidoglycan-34 

Outer membrane 
constituents- 17 

Cell exterior constituents- 16 

Surface polysaccharides & antigens 

Surface structures-57 
Ribosome constituents 

Ribosomal and stable RNAs-3 

Ribosomal proteins - synthesis, modificationRiboso- 

54 

Ribosomes - maturation and modification-6 

10. ORFs not listed- 102 
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0.01 1 
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0.0080 
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0.0038 
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0.012 
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0.0075 
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a MM =Minimal medium, ^exp. = exponential, number following each description is the number 
of genes summed 

Table 4. Gene expression elevated by the presence of a sdiA multi-copy plasmid. 



Genes (grouping by function) Fold induction Genes (grouping by function) Fold induction 



1. Cell processes 8. Not classified 

Cell division agal 3.3 

flsA 10. chpA 3.0 

ftsQ 8.8 dinl 2.0 

ftsZ 11 dinP 

minC 2.1 envR 2.2 

minD 2.7 ppdB 2.9 

minE 2.4 sokA 4.9 

sdiA 30. sugE 2.2 

sulA 2.6 uvrY 11.9 

Chemotaxis and motility 9. Open reading frames of 

unknown functions 

Transport of large molecules apaG 2.0 

Protein, peptide secretion hdeB 2.4 

msyB 2.1 relE 2.1 

oppA 2.1 sprT 3.8 

sapB 2.2 bOQ65 3.7 

secD 2.5 600P7 2.1 

secF 2.4 £0/55 6.4 

Transport of small molecules 60737 2.5 

Amino acids, amines b0138 2. 1 

glnN 4.0 60/*/ 4.6 

glnQ 2.5 60/53 2.8 

Carbohydrates, organic acids, ale b0189 2.3 

araE 5.0 6022* 2.1 

■ frvA 3.2 60225 6.4 

/rw£> 2.1 602J2 3.1 

gntU-J 2.0 60233 2.9 

srlB 2.1 6023* 3.1 

xylF 3.8 602*5 2.3 

Cations b0269 3.2 

6J? 2.0 b0281 2.4 

c/io/f 2.0 602P5 2.8 

feoA 5.8 60300 2.0 

fepD 2.1 60303 4.7 

/rifcC 4.1 60322 2.5 

2. Elements of external origin: 60*0* 3.0 
Transposon-related functions 60*07 2.9 

rhsC 63 b04J2 2.4 
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Table 5. Gene expression reduced by the presence of a sdiA multi-copy plasmid. 



Genes (grouping by function) Fold repression 
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Transport 
Protein, peptide secretion 

dppA 2.3 
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sdaC 3.9 
Carbohydrates, organic acids 
alcohols 

fadL 2.5 



Genes (grouping by function) Fold repression 
6. Structural elements 
Outer membrane constituents 

flu 7.7 
Cell exterior constituents 

nanA 3.7 
Surface structures 



fiihA 


2.6 
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flgB 
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flgD 


17 




17 
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flgj 
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JlgN 


5.6 
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Table 6. Gene expression profiles of MG 1655 strain when exposed to MMC 
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230.79 63.00 3.66 hybF | 85.60 616.14 0.14 hybF 
Gene names written in bold letters are SOS response genes;M15: 15 min exposure 
to MMC;M40: 40 min exposure to MMC 



Table 7. Gene expressions in DM800 and DM803 when exposed to MMC 



Gene name 


b 


DM800 


control 


ratio 


DM803 


DM803 


ratio 


number 


MMC 






MMC 


control 


1.2 


recN 




98536.3 


11895.5 


8.3 


2454.6 


2089.9 
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780.0 


516.8 
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145.6 474.0 0.3 

814.1 2717.5 0.3 
7715.8 26048.3 0.3 
238.5 834.4 0.3 

13321.2 46691.6 0.3 
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Table 8. Gene expressions in DM800 and DM803 when exposed to MMC 
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Table 9 

Most highly expressed genes in Synechocystis sp. PCC6803 in minimal growth media 
5 (BG1 1 + 5mM glucose). 



Systematic 
Name 


Gene 


Function 


Transcript 
copy in total 
mRNA 
(Average 
copy=l) 


s!r2051 


cpcG 


phycobilisome rod-core linker 
polypeptide CpcG 


64.91 


slll580 


cpcC 


phycocyanin associated linker protein 


22.71 


slr0447 


amiC 


negative aliphatic amidase regulator 


19.45 


sill 070 


tktA 


transketolase 


19.24 


sU0018 


cbbA 


fructose- 1, 6-bisphosphate aldolase 


14.27 


slrOOl 1 


rbcX 


ND* 


12.00 


ssl0563 


psaC 


photosystem I subunit VII 


1 1.31 


slrl655 


psaL 


photosystem I subunit XI 


10.91 


S110819 


psaF 


photosystem 1 subunit III 


10.56 


sill 867 


psbA3 


photosystem II Dl protein 


10.43 


S111324 


atpF 


ATP synthase subunit b 


10.37 


sill 746 


rpll2 


50S ribosomal protein L12 


10.13 


S111099 


tufA 


protein synthesis elongation factor Tu 


9.48 


slr0009 


rbcL 


ribulose bisphosphate carboxylase large 
subunit 


8.39 


s!r0012 


rbcS 


ribulose bisphosphate carboxylase small 
subunit 


8.14 


S111326 


atpA 


ATP synthase a subunit 


7.72 


sirl908 




NET 


7.62 


S111578 


cpcA 


phycocyanin a subunit 


7.60 


slr2067 


apcA 


allophycocyanin a chain 


7.51 


slr2052 




ND* 


7.41 


sill 184 


ho 


heme oxygenase 


7.27 


ss!3437 


rpsl7 


30S ribosomal protein S17 


' 7.26 


S111786 




hypothetical protein (ND*) 


7.16 


SS10020 


petF 


ferredoxin 


7.07 


S111812 


rps5 


30S ribosomal protein S5 


7.04 



* ND = not determined 
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Table 10 

Most highly induced genes in Synechocystis sp. PCC6803 in BG1 1 media containing 5 

u#Uh Oft min r\f TTV-R tr^Atm^nt at 70 uF.$Hm-2 intensity. 



Systematic 
Name 


Gene 


Function 


Data 
/Control 


STD 


ssr2595 


hliB 


High light- inducible protein 


22.7 


4.7 


sir 1544 




ND* 


15.5 


7.6 


SI10528 




ND* 


12.1 


3.9 


SU1514 


hsp!7 


small heat shock protein 


9.9 


3.9 


slrl687 


nblB 


phycobilisome degradation protein NblB 


8.2 


1.9 


S1I1483 




transforming growth factor induced protein 


7.8 


2.2 


S112012 


rpoD 


RNA polymerase sigma factor 


6.3 


2.0 


ssll633 




CAB/ELIP/HLIP superfamily 


6.0 


1.0 


ssl2542 


hliA 


high light-inducible protein 


5.6 


1.6 


sll0846 




ND* " 


A 1 


n o 


slrl674 




ND* 


4.7 


1.8 


slr!604 


ftsH 


Chloroplast associated protease FtsH 


4.6 


1.9 


slr0320 




ND* 1 


4.5 


2.2 


S110306 


rpoD 


RNA polymerase sigma factor 


4.4 


1.0 


slr0228 


ftsH 


cell division protein FtsH 


4.3 


1.7 


s!rl641 


dpB 


ClpB protein 


4.3 


1.1 


ssr2016 




ND* 


4.2 


2.2 


sill 867 


psbA3 


photosystem II Dl protein 


4.1 


0.3 



* ND = not determined 



69 



WO 01/29261 PCT/US00/28352 

CLAIMS 

What is claimed is: 

1 . A method for identifying gene expression changes within a bacterial 
species comprising: 

5 (a) providing a comprehensive micro-array synthesized from DNA 

comprised in a bacterial species; 

(b) generating a first set of labeled probes from bacterial RNA, the 
RNA isolated from the bacterial species of step (a); 

(c) hybridizing the first set of labeled probes of step (b) to the 

10 comprehensive micro-array of step (a), wherein hybridization 

results in a detectable signal generated from the labeled probe; 

(d) measuring the signal generated by the hybridization of the first 
set of labeled probe to the comprehensive micro-array of 
step (c); 

15 (e) subjecting the bacterial species of step (a) to a gene expression 

altering condition whereby the gene expression profile of the 
bacterial species is altered to produce a modified bacterial 
species ; 

(f) generating a second set of labeled probes from bacterial RNA, 
20 the RNA isolated from the modified bacterial species of step (e); 

(g) hybridizing the second set of labeled probes of step (f) to the 
comprehensive micro-array of step (a), wherein hybridization 
results in a detectable signal generated from the labeled probe; 

(h) measuring the signal generated by the hybridization of the 

25 second set of labeled probes to the comprehensive micro-array 

of step (g); and 

(i) comparing signal generated from the first hybridization to the 
signal generated from the second hybridization to identify gene 
expression changes within a bacterial species. 

30 2. A method for identifying gene expression changes within a bacterial 

species comprising: 

(a) providing a comprehensive micro-array synthesized from DNA 
comprised in a bacterial species ; 

(b) generating a first set of fluorescent cDNA from bacterial RNA, 
35 the RNA isolated from the bacterial species of step (a); 

(c) hybridizing the first set of fluorescent cDNA of step (b) to the 
comprehensive micro-array of step (a), wherein hybridization 
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results in a detectable signal generated from the fluorescent 
cDNA; 

measuring the signal generated by the hybridization of the first 
set of fluorescent cDNA to the comprehensive micro-array of 
step (c); 

subjecting the bacterial species of step (a) to a gene expression 
altering condition whereby the gene expression profile of the 
bacterial species is altered to produce a modified bacterial 
species; 

generating a second set of fluorescent cDNA from bacterial 
RNA, the RNA isolated from the modified bacterial species of 
step (e); 

hybridizing the second set of fluorescent cDNA of step (f) to the 
comprehensive micro-array of step (a), wherein hybridization 
results in a detectable signal generated from the fluorescent 
cDNA; 

measuring the signal generated by the hybridization of the 
second set of fluorescent cDNA to the comprehensive micro- 
array of step (g); and 

comparing signal generated from the first hybridization to the 
signal generated from the second hybridization to identify gene 
expression changes within a bacterial species . 

3. A method according to either Claim 1 or 2 wherein the bacterial 
species is selected from the group consisting of enteric bacteria, Bacillus, 

25 Acinetobacter, Streptomyces, Methylobacter, Pseudomonas, Rhodobacter and 
Synechocystis 

4. A method according to either Claim 1 or 2 wherein the signal 
generating label is selected from the group consisting of fluorescent moieties, 
chemiluminescent moieties, particles, enzymes, radioactive tags. 

30 5. A method according to Claim 4 wherein the signal generating label is 

a fluorescent moiety and is selected from the group consisting of cy3 and cy5. 

6. A method according to either Claim 1 or 2 wherein the comprehensive 
micro-array contains at least 75% of all open reading frames in the bacterial 
species. 

35 7. A method according to Claim 6 wherein the comprehensive micro- 

array contains from about 2000 to about 6000 open reading frames. 

8. A method according to either Claim 1 or 2 wherein the gene 
expression altering condition is selected from the group consisting of a condition 
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altering the genotype of the bacterial species, a condition altering the growth of 
the bacterial species , exposure to mutagens , antibiotics, UV light, gamma-rays, 
x-rays, phage, macrophages, organic chemicals, inorganic chemicals, 
environmental pollutants, heavy metals, changes in temperature, changes in pH, 
5 conditions producing oxidative damage, DNA damage, anaerobiosis, depletion or 
addition of nutrients, addition of a growth inhibitor, and desiccation. 

9. A method for identifying gene expression changes within a genome 
comprising: 

(a) providing a comprehensive micro-array synthesized from DNA 
10 comprised in a prokaryotic or eukaryotic speices; 

(b) generating a control set of fluorescent cDNA from total or 
polyadenylated RNA, the RNA isolated from the species of 
step (a), the fluorescent cDNA comprising at least one first 
fluorescent label and at least one different second fluorescent 

15 label; 

(c) mixing the control set of fluorescent cDNA labeled with the at 
least one first label with the control set of fluorescent cDNA 
labeled with the at least second first label to for a dual labeled 
control cDNA; 

20 (d) hybridizing the dual labeled control set of fluorescent cDNA of 

step (c) to the comprehensive micro-array of step (a), wherein 
hybridization results in a detectable signal generated from the 
fluorescent cDNA; 

(e) measuring the signal generated by the hybridization of the dual 
25 labeled control set of fluorescent cDNA to the comprehensive 

micro-array of step (c); 

(f) subjecting the prokaryote or eukaryote of step (a) to a gene 
expression altering condition whereby the gene expression 
profile of the prokaryote or eukaryote is altered to produce a 

30 modified prokaryote or eukaryote ; 

(g) generating an experimental set of fluorescent cDNA from total 
or polyadenylated RNA, the RNA isolated from the modified 
prokaryote or eukaryote of step (e), the fluorescent cDNA 
comprising the first fluorescent label and the different second 

35 fluorescent label to step (b); 

(h) mixing the experimental set of fluorescent cDNA labeled with 
the at least one first label with the experimental set of 
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fluorescent cDNA labeled with the at least second first label to 
form a dual labeled experimental cDNA; 

(i) hybridizing the experimental set of fluorescent cDNA of 

step (h) to the comprehensive micro-array of step (a), wherein 
5 hybridization results in a detectable signal generated from the 

fluorescent cDNA; 

(j) measuring the signal generated by the hybridization of the 
second set of fluorescent cDNA to the comprehensive micro- 
array of step (g); and 
10 (k) comparing signal generated from the dual labeled control 

hybridization with the dual labeled experimental hybridization 
to identify gene expression changes within a prokaryotic or 
eukaryotic species. 

10. A method according to Claim 9 wherein the first fluorescent label and 
15 the second fluorescent label is independently selected from the group consisting of 

cy3 and cy5. 

11. A method according to Claim 9 wherein the prokaryotic or eukaryotic 
genome is comprised within an organism selected from the group consisting of 
enteric bacteria, Bacillus, Acinetobacter, Streptomyces, Methylobacter, 

20 Pseudomona, cyanobacteria, yeasts, filamentous fungi, plant cells and animal 
cells. 

12. A method according to Claim 1 1 wherein yeast are selected from the 
group consisting of Saccharomyces, Zygosaccharomyces, Kluyveromyces, 
Candida, Hansenula, Debaryomyces, Mucor, Pichia and Torulopsis. 

25 1 3. A method according to Claim 1 1 wherein cyantobacteria are selected 

from the group consisting of Rhodobacter and Synechocystis. 

14. A method according to Claim 1 1 wherein filamentous fungi are 
selected from the group consisting of Aspergillus and Arthrobotrys. 

15. A method for quantitating the amount of protein specifying RNA 
30 contained within a genome comprising: 

(a) providing a comprehensive micro-array comprising a 
multiplicity of open reading frames synthesized from genomic 
DN A comprised in a prokaryotic or eukaryotic organism; 

(b) generating a set of fluorescent cDNA from total or poly- 

35 adenylated RNA isolated from the prokaryotic or eukaryotic 

organism of step (a); 

(c) generating a set of fluorescent DN A from genomic DNA 
isolated from the prokaryotic or eukaryotic organism of step (a); 
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(d) hybridizing the fluorescent cDNA of step (b) to the 

comprehensive micro-array of step (a), wherein hybridization 
results in a first fluorescent signal generated from the fluorescent 
cDNA for each open reading frame; 
5 (e) hybridizing the fluorescent DNA of step (c) to the 

comprehensive micro-array of step (a), wherein hybridization 
results in a second fluorescent signal generated from the 
fluorescent DNA for each open reading frame; and 

(f) dividing, for each open reading from , the first fluorescent signal 
10 into the second fluorescent signal to provide a quantitated 

measure of the amount of protein specifying RNA for each open 
reading frame. 

16. A method for quantitating the amount of protein specifying RNA 
contained within a genome comprising: 

15 (a) providing a comprehensive micro-array comprising a 

multiplicity of genes synthesized from genomic DNA 
comprised in a prokaryotic or eukaryotic organism; 

(b) generating a set of fluorescent cDNA from total or poly- 
adenylated RNA isolated from the prokaryotic or eukaryotic 

20 organism of step (a); 

(c) generating a set of fluorescent DNA from genomic DNA 
isolated from the prokaryotic or eukaryotic organism of step (a); 

(d) hybridizing the fluorescent cDNA of step (b) to the 
comprehensive micro-array of step (a), wherein hybridization 

25 results in a first fluorescent signal generated from the 

fluorescent cDNA for each gene; 

(e) hybridizing the fluorescent DNA of step (c) to the 
comprehensive micro-array of step (a), wherein hybridization 
results in a second fluorescent signal generated from the 

30 fluorescent DNA for each gene; and 

(f) dividing, for each open reading from , the first fluorescent signal 
into the second fluorescent signal to provide a quantitated 
measure of the amount of protein specifying RNA for each gene. 

17. A method for identifying gene expression changes within a bacterial 
35 species according to either Claim 1 or 2 providing for quantitating the amount of 

protein specifying RNA contained within a genome according to either Claim 15 
or 16. 



74 



WO 01/29261 PCT/USOO/28352 

18. A method for identifying gene expression changes within a genome 
according to Claim 8 providing for quantitating the amount of protein specifying 
RNA contained within a genome according to Claim 15 or 16. 
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